Welcome

Welcome to my blog.

ADFS::Blog.$

Blog

The Reconciliation Agent: Keeping Agentic Code Honest

I've been spending a fair bit of time building with Claude Code in the past few months, and like most people doing the same, I've run into a failure mode where the code slowly and then not-so-slowly diverges from my original intent. I ended up feeling like I was wrestling with the agent to clarify precisely what was being implemented, and why - especially context got compacted, re-compacted and eventually lost.

This post describes a lightweight pattern I built to address it.

The wondering agent

The agent writes coherent, well-structured, passing code — it just drifts, incrementally, from your original intent. Each session the model reads the current codebase as its primary context. Over enough iterations, your implementation becomes the spec. The rate of code accumulation is so rapid that it quickly becomes the most important signal about what you're trying to do, especially when the description of your original intent is a few dozen tokens at the start of a session (or the session before).

And your tests (written by the same agent, in the same sessions) end up testing the implementation rather than the requirements. At that point you have a system that does something confidently, and a test suite that confirms it's doing that something consistently.

When using a coding agent, intent is the key thing that a human developer brings, and the reliance on the codebase to express that intent didn't feel like a strong enough signal. I wanted to focus on describing intent - currently using fairly typical product artefacts (feature shaping documents/marketing docs/detailed requirements docs) - and feel confident that the code is sticking to that intent, and flagging where divergence happens.

Current tooling

I found a bunch of tooling aimed at the docmentation drift problem, specifically in docs that describe what the code does or how it works. driftcheck runs as a pre-push hook and flags documentation that contradicts code changes. agent-guard maintains a self-healing documentation layer that keeps markdown docs current with your source. Agent OS injects coding standards forward into the agent's context window. These are all useful. They're all also code-first — they treat the implementation as authoritative and ask whether the documentation is keeping up.

This is kinda the opposite of what I wanted to do - I want a tool that tells me my code is wrong when it diverges from the product level documentation that describes what I set ou to do, rather than updating user/developer facing docs when they diverge from the code.

Prior art

The core design principle is that product documentation is the ground truth. Everything else — code, tests, configuration — is subordinate to declared intent. The key control signal is then the test suite - each line item in the product requirement documentation is represented by a test - with the overall implemention approach in Claude Code focussing on TDD so that non-passing tests for each requirement are written first.

I keep the full documentation corpus in the repo under docs/, with MkDocs rendering it into something human-navigable. The corpus spans the full stack: market research and user personas at the top, through shaping documents, product requirements (PRDs) and functional specs, down to API contracts and data models. Everything that constitutes intent lives there, versioned alongside the code.

The reconciliation agent is implemented as a simple Claude Code agent, it reads the docs/ corpus and compares it against the current working tree — and produces a structured divergence report covering two things:

Capability divergence — features present in code but not in spec, or specified but not implemented
Test–spec alignment — requirements with no test coverage, and tests that don't map onto any documented acceptance criterion

The second category is the one that matters most. It's also the one that's hardest to catch by eye, because a well-written test looks identical whether it's testing the right thing or not.

Static analysis and coverage tools reason about code-to-code relationships: is this function called, is this branch reached. They have no purchase on whether the assertion itself corresponds to a documented requirement.

The gate

I've been running this ad-hoc after long periods of iteratively working with Claude on chunky problems - as this will often highlight areas where I've diverged from my original intent - or it'll highlight insufficiencies and gaps in the product specs. It also runs as a pre-commit hook to force a review before anything is committed.

The hook exits with a non-zero code if it finds critical or high-severity findings. Commit is blocked. You either fix the divergence or explicitly override it with --no-verify.

I've added a (slightly modified) version of my AGENTS.md file here if you fancy taking a peek.

Upshot

The immediate effect is catching things — capabilities that crept in without spec coverage, tests that drifted from their acceptance criteria. But the more durable effect is that it enforces spec hygiene. If an agent is going to read your product docs and reason against them, those docs need to be information dense. The reconciliation loop creates a forcing function for specificity.

It also subtly reframes the relationship with agentic coding. Rather than treating the agent as a collaborator that lives in the codebase, it positions the documentation as the senior partner. The code is always provisional, the intent is not.

Of course, this then raises a follow-on problem of keeping your 'product' level docs relevant - especially challenging when sovling a technical challenge requires a rethink of what you thought you wanted to do.

← Back to blog

System	RISC OS 3.11
CPU	ARM610 @ 30 MHz
Total RAM	4,096 KB
Free RAM	2,847 KB
Disc	ADFS::Welcome (40 MB)
Free disc	17,342 KB
Tasks running	3
Uptime	00:00:00