Looping Reflections

On turning agent surprises into harness evolution

Apr 23, 2026

Le Bon Mot stood at the intersection of Curiosity and Mild Alarm, which was not a real address in any municipal sense, but was nonetheless where most important software conversations eventually arrived.

Rain worried softly at the windows. Madame Beauregard polished cups with the concentration of a surgeon and the suspicion of a customs officer. Sophie, asleep on the philosophy shelf, had arranged herself in the shape of a question mark, which in that café counted as both decoration and warning.

Case arrived with a laptop under one arm and the look of someone who had recently been betrayed by something that had smiled while doing it.

She sat at her usual table beneath the slow brass clock and laid out three things in front of her: a printout of a failing test, a notebook filled with arrows and annoyed underlining, and a second printout bearing the title From Technical Debt to Cognitive and Intent Debt.

The Librarian watched this arrangement with interest.

“That,” he said, appearing with a cortado, “is either the start of a breakthrough or an inquest.”

Case rubbed her eyes. “Both, probably.”

The Djinn was already there, of course. He was seated in the corner in his usual impossible but incredibly confident way, as though the darkness had taken a passing interest in upholstery and chosen to participate. His hood was up. His hands were folded. He wore the expression of something that did not breathe but enjoyed the theatre of pretending.

“You look,” he said, “like someone who has asked for velocity and received philosophy.”

“I asked for a small change,” said Case. “I got a lesson in why small changes are never small.”

The Librarian leaned against the shelf marked Systems, Errors, Regrets.

“Go on.”

Case tapped the printout. “We’ve been improving our agent harness. Constraints. Checks. Better prompts. Better boundaries. We’ve got the usual machinery. The agent is productive. Fast. Sometimes unreasonably fast. And yesterday it did something clever.”

She paused.

“It saw a repeated validation pattern,” she said. “Quite correctly. It noticed duplication. It extracted it into a shared helper. Beautifully named. Elegant enough to make me briefly religious.”

“And the problem?”

“It unified two rules that looked similar but were not the same. One was a business boundary. The other was a safety boundary. The tests still passed.”

The Librarian closed one eye. “Ah.”

“Exactly,” said Case. “Ah.”

The Djinn inclined his head. “So the code improved and the system worsened.”

“Yes.”

“Classic.”

Case pushed the paper toward them. “At first I thought: technical debt. The helper’s wrong. The abstraction’s wrong. Fix the code. But that wasn’t the real problem. The real problem was that the agent had no durable way to know why those two patterns were different, and if I’m honest neither did half the team anymore. The rationale lived in a conversation from six months ago, in one ADR nobody read, and in my head on good days.”

The Librarian smiled sadly. “Intent debt and cognitive debt wearing technical debt’s coat.”

Case pointed at him. “Exactly that.”

She turned to the printed paper as if it were an affidavit.

“Technical debt is in the code. Cognitive debt is in the erosion of shared understanding. Intent debt is in the missing or incomplete goals, constraints, and rationale. The code wasn’t the first thing that failed. Our memory failed. Our explanation failed. The code just had the decency to show the bruise.”

Madame Beauregard arrived with a second cup, though nobody had ordered it. Le Bon Mot did not always wait for consent when caffeine was clearly the right answer.

The Djinn looked amused.

“And what did you do?” he asked.

Case gave him a flat look. “The old thing. I fixed the helper. Added a note. Muttered about being more careful. Carried on.”

The Librarian winced the way priests do when someone confesses to merely administrative sins.

“And now?”

“And now,” said Case, “I think that was the wrong move. Or rather: incomplete. Because the important thing wasn’t the defect. It was the surprise.”

At this, Sophie opened one eye. Case leaned back.

“The harness engineering plugin has this reflection loop. /reflect1. Capture what surprised the agent. Capture what surprised us. From the session, and from our own rough notes. Then, later, promote the useful bits into the durable memory of the harness. Not every observation. Only the ones that teach the environment how to shape future work.”

The Librarian nodded. “A café notebook, not a courtroom transcript.”

“Right. But more than that. I’d been treating reflection as a postscript. A nice thing. A tidy thing. A bit of moral housekeeping. Now I think it’s where the harness actually learns.”

The Djinn smiled. “At last.”

Case frowned. “You say that as if you’ve been waiting.”

“I have,” said the Djinn. “Humans keep trying to build better agents by staring harder at the agent or the LLM. Better prompts. Better models. Better instructions. They treat the surprising behaviour as a defect in the creature.”

“And it isn’t?”

“It often is,” he said pleasantly. “But it then shifts to be a defect in the habitat.”

The rain pressed harder on the windows, as if hoping to overhear.

The Librarian drew the paper toward him and tapped it once.

“This triple debt model,” he said, “is useful because it reminds people that systems do not live in code alone. They live in code, in shared understanding, and in articulated purpose. Surprise reveals where those three have drifted apart.”

Case nodded slowly.

“The agent didn’t just make a mistake. It showed us an unmarked boundary in the system. A place where intent had not been externalised strongly enough. A place where our shared understanding had gone thin.”

The Librarian brightened. “Precisely. Surprise is not always failure. Sometimes it is disclosure.”

The Djinn spread his hands. “I produce outputs. You infer intelligence. But what I really reveal, over time, is the quality of the constraints around me.”

Case looked down at her notebook. Pages of observations. Arrows. Times. Commit hashes. Half a dozen incidents that had seemed minor in isolation. The same service over-documented and under-constrained. A rename that improved readability but broke an integration assumption nobody had written down. A generated test suite that faithfully encoded the current wrong behaviour. An elegant refactor that removed duplication and with it the only visible clue that two domains had once been deliberately separated.

None were catastrophic. All were educational.

“You’re saying,” she said slowly, “that the reflection loop matters because it turns surprise into design material.”

The Librarian pointed at her with the spoon. “There it is.”

Case’s face changed then. Not into happiness. Le Bon Mot did not deal in such simple currencies. But into recognition. Which is rarer and usually more useful.

“I think,” she said, “we’ve been using the harness like a fence.”

“That is what most people do,” said the Djinn. “Fences are comforting.”

“But it should be more like a trellis.”

The Librarian smiled broadly. “Now you’re gardening. Voltaire would be proud,”

Case stood and began pacing in the small strip of space between table and shelf, which was how she thought when the idea had gotten into her muscles.

“A fence says: do not cross here. Necessary, sometimes. But a trellis says: grow this way. It takes what emerges and gives it shape. The reflection loop is how you notice where the growth surprised you. Constraint adjustment is how you decide whether that surprise was bad growth, missing support, or simply a clue that your mental model is out of date.”

The Djinn’s eyes glinted beneath the hood.

“Yes.”

Case was fully in it now.

“So if an agent behaves surprisingly, there are at least three questions. First: did the code reveal technical debt? Something brittle, duplicated, inconsistent, easy to break. Second: did the surprise reveal cognitive debt? Something the team no longer understands well enough to predict. Third: did it reveal intent debt? A missing rationale, constraint, or decision record that the agent could not possibly infer safely.”

The Librarian lifted his cup in salute.

“Good. Keep going.”

“And /reflect matters because without it, surprise evaporates. The team gets annoyed, fixes the surface, and moves on. We lose the signal. We pay the same tuition twice.”

“Or weekly,” said the Librarian.

“Or daily,” said the Djinn.

Case ignored them.

“But with reflection, the surprise becomes an artefact. Not as bureaucracy. As memory. Then, when reviewed, it can become one of several things: a sharper constraint in HARNESS.md; a promoted gotcha or architecture decision in AGENTS.md; a new test that captures intent rather than current behaviour; a model routing change; a GC rule; maybe even a human process adjustment.”

Madame Beauregard, who had been listening while pretending not to, gave a small snort.

“In other words,” she said, “stop being surprised by the same idiot thing.”

Case laughed despite herself. “In slightly more elegant terms, yes.”

The Librarian reached for the paper again.

“There is a line in here I like,” he said. “Understanding should be treated as a deliverable.”

Case nodded. “Yes.”

“So the reflection loop is not just a maintenance ritual. It is an understanding practice with a concrete deliverable.”

That landed.

Because that was the missing piece, wasn’t it? The reason that reflection felt optional to people who still thought software was mainly code. If the only product was the merged feature, then reflection was overhead. But if shared understanding was also a deliverable, if intent had to be kept alive in artefacts fit for both humans and agents, then reflection was not overhead at all. It was one of the acts by which the system remained inhabitable.

The Djinn tilted his head.

“You are learning the old lesson in the new dialect.”

“Which is?”

“That every environment teaches,” he said. “The question is whether it teaches deliberately.”

Case sat again. Outside, the rain had shifted from complaint to commitment.

“I think the dangerous version of AI-assisted work,” she said, “is when people imagine that harness constraints should be complete before the real work begins. As if you could specify the entire future of a living codebase and team in advance.”

The Librarian laughed. “That would save so much time. Which is why you know it cannot be true.”

“The harness has to evolve,” Case said. “Not sloppily. Not reactively in the worst sense. But empirically. You observe the agent. You observe the team. You note where the system behaves in ways that are legal according to the existing rules but wrong according to the deeper purpose. That is where the next layer of the harness comes from.”

The Djinn looked pleased, which on him resembled weather forming offshore.

“Yes. The mature harness is not the one with the most rules. It is the one that has learned the right ones.”

“And some rules,” said the Librarian, “should not be rules at all.”

Case looked up. “Go on.”

“Some surprises reveal not that the agent needs tighter control, but that the team needs better shared language. Or better onboarding. Or a clearer ADR. Or a domain walkthrough. If you answer every surprise with a prohibition, you build a prison. If you answer none of them, you get a swamp.”

Sophie sneezed in her sleep, which in that moment felt like endorsement.

Case stared at her notes.

There it was. That was why the loop mattered. Reflection was not a complaint box for AI weirdness. It was a diagnostic surface. A place where the organisation could tell the difference between three kinds of repair:

Repair the code.
Repair the understanding.
Repair the intent.

And sometimes, if you were honest enough, repair the harness so that next time the surprise arrived earlier, cheaper, and in a form the team could actually use.

Case opened her laptop. The glow from the screen lit her face and threw a pale bar across the table between the coffee cups and the paper. The Djinn watched her with the patient interest of something that knew it would eventually be blamed and did not mind, provided the blame was at least educational.

“What are you doing?” asked the Librarian.

“Writing the reflection.”

“Now?”

“Yes, now. While I still remember what the surprise felt like.”

She began to type. Not a diary entry. Not a lament. A proper reflection:

Task: refactor validation duplication across order workflow.
Surprise: agent unified two similar patterns that encoded different business and safety boundaries; tests passed, behaviour intent weakened.
What this revealed: missing durable articulation of why those boundaries differ; team knowledge uneven; current tests verify mechanics but not domain purpose.
Proposals:
Add an intent-level test that distinguishes business from safety validation paths.
Promote boundary rationale into AGENTS.md as a GOTCHA and add ADR link.
Update HARNESS.md with a constraint on collapsing superficially similar validation flows without explicit domain check.
Schedule walkthrough for onboarding and shared understanding.

She stopped typing and read it back.

The Librarian said nothing.

The Djinn said nothing.

Madame Beauregard, from behind the counter, said, “Better.”

Case smiled.

It was not a grand smile. Le Bon Mot never trusted those. It was the smile of a person who had finally understood where the work really was.

Because that was the trick, wasn’t it? AI did not merely accelerate implementation. It accelerated the rate at which hidden misalignments became executable. The system could now get to the wrong place much faster and much more elegantly. Which meant the old idea of debt had to be widened. Technical debt still mattered. Of course it did. But now cognitive debt and intent debt accumulated at machine speed whenever a team accepted outputs faster than it rebuilt understanding and externalised purpose.

The reflection loop, then, was not sentimental. It was defensive architecture for meaning. A way of refusing cognitive surrender. A way of catching the moment where surprise could still be turned into knowledge. A way of keeping the harness alive enough to deserve the name.

The brass clock ticked four minutes behind the world. Sophie shifted on her shelf. The rain went on rehearsing its argument against the glass.

Case saved the file.

Then, because she was Case and had learned to mistrust any improvement not anchored in action, she opened HARNESS.md beside the reflection and began sketching the new constraint carefully2, not as a panicked ban, but as a statement clear enough that a future agent—and a future tired human—might understand what must not be collapsed and why.

The Djinn rose. As ever, he seemed less to stand than to gather.

“At last,” he said.

Case looked up. “You said that already.”

“Yes,” he replied. “But it was true twice.”

He moved toward the door, a hooded absence passing through lamplight. At the threshold he turned back.

“The best harnesses,” he said, “are not built by people who never meet surprise. They are built by people who know how to listen when surprise arrives.”

Then he was gone into the rain. The Librarian collected the empty cups.

“And what,” he asked lightly, “have we learned?”

Case closed the laptop.

“That the point of reflection is not to admire what happened,” she said. “It is to decide what the habitat must learn from it.”

The Librarian nodded. “Sounds good.”

He glanced at the sleeping dog, the paper on triple debt, the fresh reflection now sitting in the log like a small lantern against forgetfulness.

“Factories,” he said, “prefer repetition.”

“And habitats?” asked Case.

He smiled.

“Habitats,” he said, “prefer adaptation before the inhabitants forget how to survive.”

Sometimes art really does imitate life… Check out this repository for an example of the /reflect command.

You can even consider doing this with the Djinn helping using something like /harness-constrain

A Software Enchiridion

Discussion about this post

Ready for more?