The Doppelganger: Testing AI-Built Software Like a Real User

Maudel Team avatar
Maudel Team
Cover for The Doppelganger: Testing AI-Built Software Like a Real User

There’s a subtle problem with AI-generated code: the same system that writes the code also writes the tests. When both the implementation and the validation come from the same source, you get circular confidence — the tests pass because they were designed to pass, not because the software works.

We built the Doppelganger to break that circle.

The Circular Validation Problem

Here’s how it happens. The AI agent reads the story’s acceptance criteria and generates an implementation. Then it generates unit tests for that implementation. The tests verify that the code does what the code does — not that the code does what the user needs.

This is the “code works but the feature doesn’t” class of bugs. The API returns the right data structure, but the UI doesn’t render it correctly. The form validation passes, but the error messages are confusing. The payment flow completes, but the confirmation email never sends.

Unit tests don’t catch these because they test at the wrong level of abstraction.

What the Doppelganger Does

The Doppelganger is a browser-based test harness that validates from the user’s perspective. It:

  1. Reads the acceptance criteria from the story document — the original user intent, not the implementation’s interpretation of it
  2. Generates Playwright test scenarios that exercise the feature as a user would interact with it
  3. Runs those tests against the actual UI — clicking buttons, filling forms, navigating pages, verifying visual output

The key difference: the Doppelganger’s test scenarios come from the acceptance criteria, not from the code. It’s testing what was asked for, not what was built.

An Example

Consider a story: “As a user, I can filter the dashboard by date range, and the table updates to show only matching records.”

The AI implementation might correctly filter the data and update the table. The AI-generated unit test verifies the filter function returns the right records. Both pass.

The Doppelganger opens the actual dashboard, clicks the date picker, selects a range, and verifies that the table visually updates. It might catch that the date picker doesn’t open on Safari, or that the loading spinner never disappears, or that the “no results” state shows the wrong message.

These are the bugs that ship to production when you only test at the unit level.

How It Fits the Pipeline

The Doppelganger runs as part of the validation stage — after implementation, after unit tests. It’s not a replacement for unit tests; it’s a different kind of confidence.

The execution flow:

  1. Implementation stage generates code and unit tests
  2. Unit tests run and pass (code correctness)
  3. Doppelganger reads acceptance criteria from the story document
  4. Doppelganger generates and runs Playwright scenarios (user experience correctness)
  5. Results feed into the quality gate evaluation

If the Doppelganger fails, the Pipeline Minder flags the issue and the orchestrator can trigger revision — sending the implementation agent back with specific user-facing failures to fix.

The Lesson

AI-generated code needs AI-independent validation. When the same system writes the code and the tests, you need a third perspective — one that tests from the user’s point of view, against the original intent, not the implementation.

The Doppelganger isn’t sophisticated AI. It’s a simple idea applied at the right point in the pipeline: test what was asked for, not what was built.