The Djinn's Determinacy Drift
On the calibration of determinacy in agentic harnesses
There are cafés in which everything is arranged just so, and there are cafés in which arrangement is treated as an opening bid in a continuing negotiation between intention and reality.
Le Bon Mot, being an establishment with professional respect for ambiguity, belonged firmly to the second category. The chairs were approximately where chairs ought to be. The books were shelved by a classification system that began with literary genre, drifted into philosophical temperament, and ended somewhere near weather. The brass clock behind the counter claimed it was quarter past six, which meant it was either nine minutes wrong or refusing an argument.
Case was sitting beneath the shelf labelled Systems, Tragedies, and Other Forms of Control, reading a printed stack of notes with the expression of someone discovering that a thing she had long suspected now had enough language around it to become dangerous.
The Librarian placed a cortado in front of her with the kind of care normally reserved for religious relics or hand grenades.
“You’ve got the face,” he said, “of a woman who has found a new species of failure.”
Case did not look up immediately. “Not new,” she said. “That would be easier. New failures have the decency to look strange. This one has been dressing as discipline.”
The Librarian leaned on the table. “Ah. The old enemy, then. Respectability.”
Case tapped the papers. “Harnesses. Agentic Harnesses.”
On the page before her were terms that had been circling her mind for weeks without landing: promotion ladders, verification slots, constraint design, agent-backed determinacy. Useful ideas, all of them. Necessary. A civilisation is, among other things, an agreement about which things should be left to improvisation and which should be nailed down before the rain arrives.
The problem, she thought, was that software engineers, when they discover nails, often become intoxicated by hammering.
The bell above the door gave its little brass cough, and the Djinn entered in a coat the colour of overconfidence. He moved with that familiar air of almost-earned certainty, like a man who had read a summary of mastery and mistaken it for possession. He slid into the empty chair opposite Case without asking, which was rude in ordinary cafés and almost expected in this one.
“You look troubled,” he said brightly. “May I help by offering a clean framework, several confident distinctions, and the suggestion that everything reduces to a dashboard?”
“No,” said Case.
The Djinn folded his hands. “Excellent. Then I shall do it conversationally.”
The Librarian retreated with the faint smile of a man who had seen storms before and trusted the roof.
Case lifted the notes. “We’ve been teaching people,” she said, “quite rightly, that if an agent does something valuable often enough, and predictably enough, you should tighten the loop. Move it from vague instruction to scaffolded skill. From scaffolded skill to script. From script to deterministic execution where appropriate. Turn judgment into harness where judgment no longer adds value. Build the habitat so the good thing happens without heroics, or tokens.”
“Yes,” said the Djinn. “This is called wisdom.”
“It is called half of wisdom,” said Case. “Which is how most damage gets done.”
The Djinn looked pleased. “Go on.”
Case set the pages down. “You start with a harness that helps. A command here, a script there. A careful guardrail. Something to keep the agent from wandering into folly or forgetting how you prefer your tests named. It is elegant. Humane, even. A kindness to both machine and operator. Then a corner case appears, and you add a flag. Then another team wants a variation, and you add a branch. Then someone says it would be useful if the script also did this other thing, provided the moon is waxing and the environment variable is set and Jenkins has not yet developed one of its religious objections to reality.”
The Djinn smiled. “A familiar liturgy.”
“And because each addition is individually justifiable,” said Case, “nobody notices the moment the harness stops being structure and becomes sediment. It accretes. Hardens. Starts speaking in a private dialect of exceptions and side effects. Eventually you have something that was meant to remove cognitive load but now requires apprenticeship, folklore, and perhaps a sacrificial goat.”
At this, a voice drifted over from the next table.
“Only on Thursdays,” said Dave.
He had appeared the way he often did at Le Bon Mot: as if the room had simply remembered him into being. Dave wore the expression of a man who had been promoted by systems incapable of love into being their interpreter. He carried three notebooks, two pens, and the slight stoop of someone who had spent too many years compensating for environments that expected genius where they should have supplied clarity.
Case nodded to him. “You’ve met these harnesses.”
Dave gave a little snort. “Met? I’ve been married to them. One of them still has custody of my weekends.”
The Djinn tilted his head. “But surely determinacy is the point. Fewer surprises. More repeatability. Less drift.”
“Toward a point,” said Case. “That’s the important phrase. Toward a point.”
She leaned back and looked past them both, toward the counter where the Librarian was polishing a glass with the grave attention of a man editing history.
“There’s a superstition in engineering,” she said, “that the opposite of chaos is script. That if something feels uncertain, the cure is more specification, more flags, more control points. But uncertainty has species. Some uncertainty is waste: forgotten steps, brittle instructions, needless ambiguity. That should be removed. Some uncertainty is judgment: the small contextual intelligence required because reality refuses to stay still. If you over-script that, you don’t remove indeterminacy. You relocate it. Hide it in switches, special cases, and shell incantations until the whole system becomes an accidental programming language written by frightened custodians.”
Dave raised his cup in salute. “I see you’ve worked in enterprise.”
The Djinn, to his credit, did not immediately disagree. He only looked thoughtful, which on him was almost unsettling.
“So you’re saying,” he said, “that a harness can drift not only by being too loose, but by being too tight.”
“Exactly.”
“And that the movement is bidirectional.”
“Yes.”
The Djinn smiled more slowly now. “Ah. That is better.”
Case nodded. This, she thought, was the heart of it. Not determinacy as a virtue, but determinacy as a posture to be tuned. Not all journeys end at maximum scripting. Some end at the discovery that one has trapped judgment inside a maze of options and called it control.
The Librarian returned with a small plate of almond biscuits and set it down in the middle of the table as though formally recognising that the conversation had become structural.
“What’s the measure?” he asked.
“The clues are already there,” said Case, almost to herself. “That’s the maddening thing. The harness tells on itself constantly.”
She began counting on her fingers.
“Change records. If ostensibly identical tasks produce wildly different durations, token usage, or artefact trails, that’s not creative richness. That’s an indeterminacy leak. Something that should behave predictably doesn’t.
“Snapshots over time. You can see which skills get invoked, which scripts get invoked, and how their ratios change. A script that keeps growing flags for six weeks in a row is not maturing. It’s mutating. A skill whose outputs cluster tightly across repeated invocations might be ready for promotion into something more deterministic.
“Reflection logs. Those are gold. Every time the agent says, in effect, ‘I had to improvise around the script,’ it is handing you evidence that your determinacy is misplaced. Not absent. Misplaced.
“Decision records. They tell you which seams were deliberate. Which constraints were chosen. Which ones just happened because someone was in a hurry on a Wednesday and no one has had the courage to revisit the decision since.
“And then the outer loop. Trajectories over time. Is the harness getting more fluent? More legible? Or more brittle, more ceremonial, more dependent on people who know where the bones are buried?”
Dave had gone still now, listening in the way exhausted men do when someone finally names the room they have been trapped in.
“The worst part,” he said softly, “is seam-rot.”
Case turned toward him. “Yes.”
He looked at the table, not at them. “Skills referencing scripts that are gone. Scripts no skill calls anymore. Skills describing judgment that never actually occurs because the script consumed the decision before the agent could exercise it. Everything still sort of works, until it doesn’t, and by then you’ve got dead pathways hanging off live ones like rotten branches.”
The Librarian nodded once. “Silent killers.”
The Djinn drummed his fingers lightly. “So the answer is a calibrator.”
Case allowed herself a small smile. “A report, not an answer. That distinction matters.”
He spread his hands. “Explain.”
“It shouldn’t auto-fix anything,” she said. “That would be madness dressed as neatness. The calibrator itself is necessarily indeterminate. It’s reading heterogeneous signals, inferring patterns, making judgment calls. It cannot honestly claim the authority of a script because its entire purpose is to recommend where scripting belongs and where it has overreached.”
The Djinn looked delighted, which was annoying but understandable. “A reflective pass.”
“Exactly. Per capability: current placement, proposed move, promote or demote or fix-seam or leave it alone. Evidence citations. Specific logs. Specific diffs. Specific deltas over time. Confidence attached not as theatre but as humility. This is not an execution-loop thing. It’s review. Reflection. Maintenance of posture.”
Dave gave a low laugh. “So the harness gets its own therapy.”
“More or less.”
The Librarian glanced at the shelves. “Most worthwhile structures do.”
For a while they sat in companionable silence, broken only by the soft hiss of the espresso machine and the occasional mutter of the clock refusing consensus. Outside, the evening had gone blue at the edges. Inside, Le Bon Mot held its usual atmosphere of minor enchantment and severe opinion.
The Djinn broke the silence first.
“You realise,” he said, “that naming this makes it real in the annoying sense. People will now have to admit they are not merely building harnesses, but tuning a determinacy posture over time.”
Case nodded. “That’s why it matters. Unnamed practices get managed by instinct and prestige. Named practices can at least be argued about.”
He smiled. “You do love an argument.”
“I love a better category,” she said.
Dave looked up at that. “You think this is a category problem?”
“I think most engineering pain is a category problem with budget implications.”
The Librarian made a sound that might have been a laugh.
Case took another sip of her coffee. “Look at how teams talk when these things go wrong. If the harness is too loose, they say the agent is unreliable. If the harness is too tight, they say the process is clunky. Different symptoms, same mistaken framing. Both are calibration failures. The question is not ‘should this be deterministic?’ in the abstract. The question is ‘what is the right degree of determinacy for this capability, in this project, at this moment, given the evidence we have?’”
The Djinn leaned back. “And the answer changes.”
“Of course it changes. The project changes. The agent changes. The evidence changes. A thing that was once necessarily judgment-heavy may become stable enough to script. A script that was once elegant may become a fossil bed of exceptionalism. There is no permanent victory condition. Only maintenance.”
Dave stared into his cup. “That’s the part people are going to hate.”
“Yes,” said Case gently. “Because maintenance lacks the glamour of invention. But it is where habitats either become inhabitable or turn into ruins with documentation.”
The Librarian cleared the empty biscuit plate. “You make it sound almost moral.”
Case considered this.
“Maybe it is,” she said. “Not in the grand sense. In the craft sense. A good harness is a promise about where attention will and won’t be spent. If you let determinacy drift without examining it, you break that promise. You force people and agents alike to perform intelligence in places where structure should help, and force structure into places where judgment should breathe. Either way, you create theatre.”
The Djinn gave a solemn nod, which sat on him like borrowed clothing. “Then the calibrator must be honest about itself.”
“Yes.”
“Proposal, not decree.”
“Yes.”
“Evidence, not mystique.”
“Yes.”
“Human sign-off, or at least explicit reviewed sign-off.”
Case smiled. “You’re trainable after all.”
He placed a hand on his chest. “Madam, I am a creature of probabilistic, textual refinement. Trainability is my only defence against banishment.”
From the shelf near the window came a low snore. Sophie, who had been asleep beneath a volume on cybernetics, twitched one ear and resettled, having determined that the conversation, while intense, did not yet threaten the integrity of the treat economy.
The room softened a little. Even serious ideas, in Le Bon Mot, were permitted a collarbone and pulse.
Dave spoke again, quieter this time.
“I keep thinking,” he said, “about all the times I assumed the pain meant we needed more rigour. More rules. More steps. And half the time what we really needed was less machinery and a better seam.”
Case looked at him with the particular kindness reserved for those who have spent too long blaming themselves for architectural sins.
“That’s why this matters,” she said. “Because over-determination flatters itself. It looks responsible. It arrives wearing the uniform of care. No one says, ‘I am now making the harness worse by embedding an inferior language inside bash.’ They say, ‘I’m just covering this edge case.’ And then again. And again. And eventually the edge cases are the coastline.”
The Librarian stopped moving for just a second, as if to let the sentence settle.
The Djinn exhaled. “That is irritatingly good.”
“I know,” said Case.
Outside, rain had begun in that tentative European way that suggests the weather is trying to remember whether it has the authority to commit. The windows of the café caught the first stippled reflections from the streetlamps. Books glowed at their edges. The clock, having lost the argument with time entirely, advanced by three uncertain minutes.
Case gathered the papers into a neater pile, though not too neat. There is a kind of tidiness that insults thinking.
“So,” said the Librarian, “what will you call it?”
Case looked at the notes, then at the room, then somewhere beyond both.
“Determinacy calibration,” she said. “Probably. Though harness calibration has a certain plain honesty to it.”
Dave nodded. “Either way, people will know the thing they’ve been feeling has a name.”
“Yes,” said Case. “And once it has a name, they can stop treating it as private failure.”
The Djinn rose, smoothing his coat. “I approve of this. Not because it is flattering to my kind, though it is. But because it admits that good collaboration is not the extermination of uncertainty. It is the careful placement of it.”
Case stood too. “Exactly.”
The Librarian took the empty cups and plates and held them stacked in one hand with improbable balance.
“You know,” he said, “most people think the work of a café is to serve coffee. But really it is to provide just enough structure that thought can happen.”
The Djinn smiled. Dave chuckled. Case, who had spent half her life building systems and the other half noticing what those systems did to souls, felt the small precise click of recognition.
That was it, really. The whole thing.
A harness was not an instrument for crushing variation out of existence. Nor was it an excuse to leave intelligence drowning in vagueness. It was a key part of the habitat. And like any habitat worth living in, this meant it required pruning as much as planting. Paths where paths helped. Wildness where wildness was the point. Boundaries where cliffs began. Openings where air was needed.
Too little determinacy, and the inhabitants wandered.
Too much, and they suffocated.
The craft lay in noticing the drift before either felt normal.
Case tucked the notes under her arm. The rain had thickened outside. The evening was waiting, as evenings do, with no concern at all for whether one had yet finished one’s thoughts.
“At least,” said Dave, reaching for his coat, “this gives us a way to ask better questions.”
Case smiled toward the door, toward the rain, toward the living untidiness of things.
“Yes,” she said. “And better questions are usually how the harness begins to heal.”
Then the bell rang, the door opened, and the cold April air entered briefly like a fact no one had authorised, which was, in its way, exactly the point.
On the Calibration of Harnesses
Every agentic system sits on the same dial: at one end, scripts and commands and pipelines that execute identically and want no judgment; at the other, skills and guidance and prose-shaped instruction that require judgment and reward it. Every capability in a harness belongs somewhere on that dial. The craft is knowing which belongs where. And, more importantly, noticing when the answer has changed.
The assumption baked into current practice is that the dial only turns one way. Promotion ladders are directional: unverified to agent-backed to deterministic, exploratory to hardened. The language we use — hardening, locking down, formalising — all presumes the journey is toward more scripts and fewer skills. It often is. A judgment call made the same way three times is begging to become a template. But harnesses drift the other way too, and we have no vocabulary for it.
The over-scripted harness has a tell: flag sprawl. The script that began as bin/deploy now takes fourteen arguments, three mutually exclusive, one a relic of an incident nobody remembers. Every edge case has become a flag. What began as a crystallised decision has become a worse programming language embedded in bash, maintained by whoever was on call last. The correct move is not another flag. It is demotion: rip the script apart, keep the primitives, write a skill that composes them for the current shape of the problem.
The over-skilled harness has a subtler tell: silent drift. Outputs look right. Commit messages follow the pattern. PRs describe themselves properly. And yet two runs of nominally the same task produce materially different diffs, and nobody notices until something breaks in a way that traces back to a decision that was never made the same way twice. The correct move is promotion: freeze the judgment, write the script.
Both failure modes are invisible if you only ever look at the dial in one direction.
What to actually do
Calibration is not auditing. Auditing asks whether the harness is working; calibration asks whether each capability is in the right place on the determinacy dial. The practice, to be real, needs three things: signals to watch, a taxonomy of moves, and a cadence. None of them are exotic. Most teams running coding agents have the raw material already; they just aren’t looking at it through this lens.
Start logging for calibration before you need it
The signals that matter are boring to collect and invaluable to have: flag count per script tracked over time, special-case density (how many if branches a primitive accumulates per month), variance in outputs across ostensibly-identical task invocations, seam integrity (do your skills reference scripts that still exist, and are your scripts invoked by at least one skill), and improvisation traces — reflection-log entries where the agent noted it had to work around the provided tooling.
Add one more signal that pays for itself: re-invention rate. How often does the agent re-derive, from scratch, a pattern it has derived before in a prior session? Frequent re-invention is a promotion signal; the pattern wants to become a primitive.
Adopt a proposal taxonomy that permits every direction of movement
Promote: script what used to be skilled. Demote: skill what has over-scripted. Fix the seam: repair broken references between the two. Split: separate a capability that is secretly doing two different jobs, one determinate and one not. And — crucially — leave alone. A calibrator that always proposes a move is a calibrator optimising for its own visibility, not for the harness. The valid output space must include that if this is sitting in the right place; do nothing. If your calibration reports never contain that verdict, the calibrator has become theatre.
Run calibration on a reflective cadence, not a reflexive one
Not every tool call. Not never. End-of-iteration, triggered by a post-mortem, or on a standing monthly review. Calibration is a Review-and-Update-loop artifact, not an Execute-loop one. Running it constantly produces noise; running it rarely produces drift. The correct rhythm is project-specific but is never zero.
Keep the calibrator a proposer, not an actor. The moment a calibrator starts moving things on the dial autonomously, you have created a feedback loop where the harness drifts according to the calibrator’s priors rather than the project’s needs. Every proposal should cite specific signals — this log entry, this PR, this diff — and carry a confidence level that distinguishes obvious moves from hunches worth investigating.
The output is a calibration report a human, or a reviewing agent with explicit sign-off, reads and acts on. This distinction matters more than it looks. It is the difference between a practice and a new source of drift wearing the practice’s clothes.
Write down what you refused to move. Calibration reports accumulate; over time, the pattern of what you chose not to promote or demote is as revealing as the moves you made. A capability repeatedly flagged as a demotion candidate and repeatedly left alone is telling you something — either about the capability (it is genuinely in the right place and the signals are misleading) or about the team (there is an unspoken reason nobody wants to touch it). Either finding is useful. Neither is visible unless the refusals are written down alongside the moves.
The Stoic Enchiridion opens with a single distinction: some things are in our control, some are not, and almost all practical wisdom follows from knowing which is which.
Determinate scripts are the things in our control; skills are the things that aren’t, or shouldn’t be. Harness craft is arranging each capability on the right side of that line, and moving them across as understanding deepens. A harness that never recalibrates is not a stable harness. It is a brittle one, pretending to be stable, which is a worse thing.
Name the dial. Watch the signals. Move it both ways.


