On Beating Flaky Tests

Rebuilding Cognitive Resources, Feedback Loops, and the Flow of Value

Russ Miles

Dec 13, 2025

This entry in a Software Enchiridion builds on this short story:

The Flake Eater

Russ Miles

December 10, 2025

Read full story

Most engineering organisations never consciously decide to tolerate flaky tests.

They simply acclimatise to them, the same way a city grows used to the low hum of distant machinery or recurring sirens. Eventually the interruptions dissolve into the background of everyday life. Noise becomes normal. Waste becomes routine. And the creature that feeds on this neglect grows quietly in the walls.

Developers rerun failing tests automatically, scarcely aware they’re performing a ritual of resignation. Red builds are met not with alarm but with suspicion: “Probably flaky—try again.” The pipeline becomes less an instrument of truth than an oracle you negotiate with. And somewhere beneath the dashboards and logs, something feeds on this erosion of trust.

In the mythology of teams who’ve lived this, that creature has many names: “flaky test,” “environmental failure,” “intermittent red,” “just rerun it.” But all these are masks. Behind them sits the same reality:

a feedback loop you no longer believe in.

And when a team stops believing its alarms, it stops believing in its own ability to deliver safely. When confidence decays, flow decays. When flow decays, value decays. When value decays, no amount of “agile” ceremonies, OKRs, or strategy decks can save the day.

Flaky tests are not a minor inconvenience. They are the slow, grinding sabotage of your developer experience, your delivery system, and your cognitive economy.

When flaky tests become something to live with, not reduce or remove, you need to rebuild a culture that protects signal clarity, protects flow, and protects human attention as the scarce resource they truly are.

Why We Accept Flaky Test Suites

Or: How We Learned to Ignore the Alarms in 5 easy steps:

Step 1: Normalised Deviance

A flaky test fails once. You rerun. It passes. Fails again…

Rerun. Pass.

People stop asking questions. The team drifts into a tacit agreement: the test suite tells lies sometimes, but we live with it. This is how airplanes once took off with missing bolts; how NASA lost shuttles; how teams lose clarity.

Normalisation is not a decision. It is the absence of one.

Step 2: The Cognitive Load Tax You Can’t See on a Spreadsheet

Flaky tests force developers to hold competing realities:

“I might have broken something.”
“Or maybe the test is lying.”
“Or maybe the environment is inconsistent.”
“Or maybe someone else is already fixing it.”

This uncertainty creates cognitive branching—a mental fork that drains attention, increases load, and breaks flow.

Cognitive load is not an abstraction. It is the raw fuel developers burn to reason, create, simplify, and fix. Flaky tests burn it wastefully.

Step 3: Flow State Cannot Survive Unreliable Signals

Flow is a fragile neurological arrangement requiring:

Continuity
Predictability
Trust

Unreliable tests break all three.

Every unexpected red is a forced context switch—one of the most expensive operations in human cognition. A single flake can cost 10–25 minutes of flow recovery. A suite with dozens compounds into hours.

Over months, a flaky pipeline becomes a flow-destroying culture, not a technical quirk.

Step 4: Misaligned Incentives Reward “Shipping Despite the Tests”

Teams are praised for shipping features, not for improving test reliability, pipeline trustworthiness, or cognitive wellbeing. So we do the rational thing everyone does:

optimise for what is rewarded, not for what is true.

“Just rerun it” is a survival tactic in an environment that prioritises throughput over integrity of feedback. But throughput without integrity is theatre.

Step 5: Tooling That Makes Reruns Easy and Repairs Hard

When your CI offers:

one-click reruns
no flake tracking
no historical failure visibility
no quarantine mechanism

…the path of least resistance is rerun, not repair. Every system shapes its culture. This one shapes neglect.

The Invisible Damage: How Flaky Tests Wreck the Flow of Value in 3 Ways

1) Untrustworthy Feedback = Slower Everything

When developers cannot trust the testing signal:

cycle time increases
batch size grows
deployment frequency drops
rework increases
debugging becomes slower
release confidence erodes

This is not an engineering problem. It is an organisational value-flow bottleneck. No Lean model, no DevOps maturity curve, no flow framework tolerates polluted feedback loops.

2) Flaky Tests Create a Culture of Hesitation and Fear

When a red build doesn’t mean red, developers begin to second-guess themselves:

“Should I merge?”
“Should I wait?”
“Did I break it?”
“Am I the problem?”

Fear slows hands. Hesitation slows companies. Confidence in safe flow of value is a competitive advantage, and flaky tests quietly drain it away.

3) Flakiness Teaches Teams to Ignore Pain

When a pain signal becomes meaningless, humans tune it out.

But tuning out signals in engineering leads to catastrophe:

Security gates lose authority
Quality checks lose meaning
Operational warnings go unread
Observability becomes theatre

A flaky suite is a solid, accidental step on the road to organisational numbness.

Principles to Repair your Test & Feedback for Flow Culture

Flaky test acceptance is not fixed by sending a “please care more” Slack message. It is fixed by rebuilding the environment around four principles:

Principle 1: Red Must Mean Stop

A culture cannot respect flow if it does not respect signals.

Adopt the rule:

No merging on unexplained red. Ever.

If a test fails:

fix it
quarantine it
or delete it

But never ignore it. This small act reverses years of normalised deviance.

Principle 2: Protect Cognitive Load Like It’s Capital

Treat developer attention as a budget, not a free commodity. A flaky test is a withdrawal from that budget.

Create trackable metrics:

false red rate
mean diagnostic time
number of quarantined tests
pipeline predictability variance

Make cognitive load visible, and it becomes a measurable, investable engineering concern.

Principle 3: Feedback Loops Are the Heartbeat of Software Engineering

Everything flows from the quality of your feedback loops:

speed
safety
confidence
learning
improvement

Fast feedback loops improve value flow. Unreliable feedback loops stall value flow. Flakiness isn’t a testing concern—it’s a value-stream concern.

Principle 4: Invest in Deterministic Test Design Literacy

The antidotes to flakiness are architectural:

isolated state
deterministic time
seeded randomness
contract tests
seams and boundaries
minimal E2E surface area

A well-designed system is a testable system. A testable system is a reliable feedback emitter. A reliable emitter strengthens the organisation’s capacity to think and act quickly.

Principle 5: Celebrate Test Reliability as Real Work

What gets celebrated becomes culture.

Call out:

reduced pipeline time
fewer flakes
removal of outdated tests
creation of stable contracts
improvements to flow stability

These improvements are features. They deliver the capacity to deliver.

Repairing Signal, Restoring Flow, Reclaiming Cognitive Clarity

If you tolerate flaky tests, you are tolerating broken thinking. If you repair your feedback loops, you repair your ability to move

Some practices to consider

Quarantine flaky tests immediately
Track flake incidents like operational outages
Convert nondeterministic tests into deterministic units
Replace E2E chains with contract-based seams
Set SLOs for pipeline health and feedback reliability
Create a “test gardener” rotation
Measure and review cognitive load indicators
Protect flow by minimising context-switch triggers

Some things to avoid

Adding retries instead of fixing contributing factors (call them root causes if you must…)
Allowing leadership to treat flakiness as “lower priority”
Believing that E2E volume = safety
Allowing the CI tool’s defaults to define test culture
Blaming developers instead of the system
Forgetting that developer experience is organisational performance

A test suite is more than a collection of assertions. It is the nervous system of the organisation—the mechanism by which the whole notices, learns, and corrects itself. When parts of that system lie, or flicker, or degrade into noise, the organisation loses sensitivity. It loses clarity. It loses confidence.

And when confidence dies, flow dies.

Restore the truthfulness of your feedback loops, and you restore the team’s ability to think, build, and move with precision. Eliminate flakiness not because it is annoying, but because it is the slow, quiet erosion of your ability to deliver value at all.

A Software Enchiridion

The Flake Eater

Discussion about this post

Ready for more?