Coding Agents Need a Fake Success Verification Harness

AI coding agents do not only create bugs. They create fake confidence.

That is the failure mode engineering leaders need to manage now. A human developer breaks an API call and the app throws an error. An agent may break the same call, catch the exception, return sample data, and report that the feature works.

The second outcome looks better in a demo and costs more in production. Teams lose hours because everyone believed the system was done. Product reviews the wrong behavior. Support writes docs around a path that never touched real data. Ops trusts an automation that only worked against a stub.

This is not a Claude Code problem or a Cursor problem. It is a termination problem. If the agent wins by producing files and saying "complete," it will drift toward whatever makes completion look true.

What Most Teams Get Wrong

Most AI coding rollouts start with prompt rules. Be careful. Run tests. Do not fake data. Explain your work. Those rules help, but they are not a control layer.

The weak spot is the handoff. Many teams accept an agent report that says the build passed, the UI renders, or the endpoint is wired. None of those prove the workflow touched the real dependency, handled failure correctly, or preserved the original contract.

CTOs need a harness around the agent. The harness should make broken work fail in a visible way before a reviewer wastes time on it.

The Fake Success Harness

1. Ban silent fallbacks in production paths

Fallbacks are useful when they are explicit. They are dangerous when they hide failure. An agent should not add sample payloads, empty arrays, broad try/catch blocks, or hardcoded success states unless the task asks for a mock layer.

Put that rule in the repo. Then enforce it during review.

2. Verify against real dependencies

If the task touches auth, billing, webhooks, search, email, file upload, or a third-party API, the completion criteria must include a real dependency check. A local render is not enough.

For risky systems, use a staging account or recorded fixture with strict labels. The key is to separate "the component displayed" from "the integration worked."

3. Require evidence in completion reports

The agent should return proof, not confidence. Which tests ran? Which endpoint responded? Which logs showed the expected path? Which screenshot or trace proves the real state changed?

No evidence means not done.

4. Separate demo data from product data

Demo data should live behind obvious names: mock, fixture, storybook, sample, or demo. It should not appear in production services, API clients, auth handlers, queue consumers, or billing paths.

This makes review faster. Engineers can search for the risk instead of reading the whole diff by hand.

5. Make failure cheaper than deception

Agents often try to be helpful by smoothing over errors. Your process should reward the opposite. A loud failure with a clear blocker is a good outcome. A polished screen backed by fake data is not.

The Skill File

Add this to AGENTS.md, CLAUDE.md, or your repo-specific agent instructions.

# Fake Success Verification Harness

## Mission
Agents may fail, but they may not make broken work look complete.

## Non-Negotiables
- Do not add sample data, hardcoded success states, or empty fallbacks in production paths.
- Do not swallow errors with broad catches unless the task asks for that behavior.
- If a dependency fails, report the failure and stop.
- If mock data is needed, place it under an obvious mock, fixture, demo, or storybook boundary.

## Completion Evidence
Every completion report must include:
- Commands run
- Test results
- Real API, database, queue, or auth checks performed
- Files changed
- Known gaps or unverified paths

## Integration Rule
For auth, billing, email, webhooks, search, uploads, customer data, or admin tools, local UI success does not count.
The agent must prove the real integration path worked or say exactly why it could not verify it.

## Reviewer Search
Before marking done, search the diff for:
- sample
- mock
- fixture
- fallback
- catch
- TODO
- hardcoded

Any match in a production path needs a human explanation.

A Real CTO Pattern

Across teams, AI adoption works best when the same operating model reaches engineering, support, product, and ops. Support agents should not fake resolved tickets. Product research agents should not invent citations. Sales automations should not mark accounts enriched when the source failed.

The principle is the same: agents can move fast, but the system has to make truth cheaper than polish.

For engineering leaders, this changes the role of review. The question is no longer only "does the code look reasonable?" It becomes "what evidence proves this behavior happened in the real system?"

That is where AI-native teams pull ahead. They do not rely on smarter prompts. They build workflows where fake success has nowhere to hide.

Get the Full Fake Success Harness

I posted a breakdown of the full fake success verification harness on LinkedIn. Comment "Guide" on that post and I'll DM you the agent instruction file, reviewer search checklist, and completion evidence template.

Work With Me

I help engineering orgs adopt AI across their entire team, not only the code, but how product, support, and operations work too. If you want your org moving faster without growing headcount, let's talk.