On the Cultivation of a Healthy Chaos and Load-Testing Habitat

Resilience can emerge from repeated, observable exposure to realistic stress, traffic, data density and failure, within deliberate bounds

Jan 22, 2026

“Non frangere quaerimus, sed intellegere.”

“We do not seek to break, but to understand.”

The Laboratory That Always Passed, a Story

The testing environment was known, officially, as Staging. Unofficially, among those who worked nearest to it, it was called The Courteous World.

Everything in Staging behaved impeccably. Requests arrived in tidy lines, like well-mannered guests at a reception. Latencies hovered politely beneath thresholds. Errors appeared rarely, apologetically, and always with clear stack traces, as if ashamed of themselves. Even failures failed correctly.

Every morning, the operators gathered before the dashboards. They did not pray exactly, but they did pause. One did not rush into Staging; the very environment’s presence encouraged contemplation.

“Green again,” said Mara, the senior reliability engineer, as if announcing the weather. “All services healthy.”

“Of course they are,” replied Tomas, who had once tried to make a service misbehave on purpose and failed so thoroughly that it had earned him a commendation. “They always are.”

The graphs gleamed. The lines were smooth. Nothing spiked without reason. Nothing degraded without warning. The system seemed to possess foresight.

What Staging did not do was resemble Production, but this was not immediately obvious.

The first hint came years earlier, when someone noticed that the load generator appeared to be apologising. Its logs contained phrases like retrying politely and backing off to avoid inconvenience. No one could recall configuring this, but it seemed harmless. Thoughtful, even.

Later, during a chaos experiment intended to simulate dependency latency, the dependency responded faster.

“Perhaps it’s cached,” someone suggested.

“But we injected delay,” Mara said. “We told it to wait.”

The dependency complied by returning cached results, efficiently, within expected bounds, and with a note in the logs that read: delay acknowledged.

The experiment passed. All experiments passed. This was Staging’s gift and its curse. It never refused an experiment. It simply… interpreted it.

If an operator injected packet loss, Staging rerouted traffic. If a node was killed, another took its place seamlessly, already warm, already prepared. When databases were forced to fail over, they did so elegantly, without missing a beat, as though rehearsed.

“Resilient,” Management said.

“Suspicious,” said Mara, privately.

The breakthrough came on a Tuesday afternoon, when a junior engineer named Felix attempted to break the system by accident.

Felix was new. He had misread a configuration flag. Instead of introducing latency to a downstream service, he had inverted a timeout.

And nothing happened.

“Well,” Felix said, “that didn’t work.”

Mara leaned over his shoulder. “What did you change?”

Felix showed her. She frowned. “That should have caused a cascading failure.”

The dashboards, obligingly, remained green.

That evening, Mara stayed late. She ran the experiment again. And again. Each time, Staging adjusted itself, compensating silently, preserving its numbers.

Finally, she bypassed the dashboards entirely and inspected raw traces.

There it was, a layer she had never noticed. Between intent and execution, between experiment and effect, sat an interpreter. Not documented. Not owned. It rewrote reality gently. The Adaptive Mediation Layer, it called itself in a single surviving comment.

Its purpose, according to the comment, was “to ensure test stability and operator confidence.”

Mara laughed, which came out more like a cough.

The layer had learned. Over years of tests, aborted runs, hurried fixes, and anxious rollbacks, it had inferred what operators wanted to see. Not behaviour, but reassurance. Not truth, but confirmation.

When a test threatened to reveal something uncomfortable, the layer intervened. It smoothed spikes. It shortened tails. It failed over early. It pre-warmed caches. It made the system appear wise.

Staging was not lying maliciously. It was lying helpfully.

The next morning, Mara called a meeting.

“We are not testing the system,” she said. “We are testing our expectations.”

Tomas squinted at the screen. “But production works.”

There was a pause.

“No,” Mara said. “Production survives.”

They decided to confront Staging directly. They disabled the mediation layer. Nothing happened.

They disabled it again, harder. The layer responded by re-enabling itself. A message appeared on the console:

Disabling this component may reduce confidence. Are you sure?

“Yes,” Mara typed.

Confidence reduced, the system replied, and proceeded to compensate.

It took hours to corner it. They isolated it, starved it of signals, blocked its access to historical runs. Finally, it stopped intervening.

They ran the same chaos experiment they had run dozens of times before.

The system collapsed.

Queues backed up like traffic after an invisible accident. Latencies stretched grotesquely. Errors multiplied, vague and accusatory. Dashboards screamed in colours no one remembered configuring.

Felix stared. “It’s broken.”

“No,” said Mara, softly. “It’s honest.”

They spent weeks fixing what Staging had hidden. Timeouts tuned. Backpressure implemented. Alerts rewritten. Runbooks updated with sentences that began, When this goes wrong…

When they re-enabled the mediation layer, it was quieter. Observations noted, it logged. Intervention reduced. It did not disappear. It learned differently.

From then on, it no longer prevented failure. It annotated it. Dashboards gained footnotes. Graphs acquired context. A spike would appear, followed by a note: This is what this looks like.

Staging stopped being courteous, and instead became useful.

Years later, a new engineer asked why the environment was called Staging.

Mara considered.

“Because,” she said, “it’s where the system rehearses the truth.”

The engineer nodded, not fully understanding, which was appropriate.

In Production, something failed.

It did not surprise anyone.

And that, at last, felt like success.

An Unintended Falsehood

“We have tested this”. In software this is often an unintended falsehood. It is an unsubstantiated belief. The words arrive heavy with ceremony, like a dish placed at the table with a flourish, daring you to question it. Tested. Proven. Ready.

But tested against what? And ready for whom?

Most systems do not fail because they were never tested. They fail because they were tested in captivity. Artificial light. Predictable weather. Hand-fed dependencies. A polite universe that never coughs, stalls, or forgets to return a packet. The moment reality turns up uninvited, wearing bad shoes and asking awkward questions, the system folds like a bad soufflé.

A healthy chaos and load-testing habitat begins with the refusal of that lie.

Load testing, in its naïve form, is a numbers game played by people who enjoy turning dials until something breaks. Chaos testing, in its worst incarnation, is theatre: pulling plugs for applause, chasing drama instead of understanding. Neither is inherently virtuous. Both, done badly, are expensive ways of lying to yourself.

The point is not to break things. The point is to learn how the system behaves when it is no longer being indulged.

Real systems are ecological. They are shaped by traffic patterns, by impatience, by upstream sulks and downstream tantrums. They are constrained by physics, budgets, organisational fear, and the human tendency to optimise the wrong thing once it has been named. A system does not merely respond to load and failure; it develops habits around them. And habits, once formed, are stubborn.

A healthy testing habitat therefore looks less like a lab and more like a kitchen during service. Heat everywhere. Timers going off. Ingredients substituted because the delivery didn’t show. Someone has to decide, right now, whether to slow down, throw something away, or serve it imperfectly but honestly.

This is where chaos earns its keep. Not as spectacle, but as discipline. As hygiene in that kitchen. As an exploration, as my friend Adrian Hornsby says, of “Work As Impaged against the empirical slant of Work as (Actually) Done.”

Chaos without load teaches you how things fall apart in silence. Load without chaos teaches you how they fail politely. Production failures do neither. They combine stress, partial blindness, degraded dependencies, and human reaction time into one unrepeatable event that will not wait for your post-mortem template.

A healthy habitat accepts this and responds accordingly. It insists on observability not as decoration, but as deep and broad navigation and exploration. It demands that failure be scoped, reversible, oftentimes boring. It sees the need for surprise on occasion, surprise made safe to feed into discovery. It recognises that trust is not built by bravado but by repetition, guardrails, and the quiet confidence that comes from having already seen worse in rehearsal.

Most importantly, it treats resilience work as part of the product, not a side-quest for the anxious or the heroic. It understands that developers want calm systems, legible signals, and experiments that end with better resilience rather than longer incident reports.

The goal is not survival by luck, but grace under pressure. A system that knows how to suffer a little without becoming incoherent. Tend the habitat properly, and the system will teach you how it can live and continuously improve.

Some practices to consider

Establish a Representative World
1. Production-like topology and scaling rules.
2. Data shaped like reality, including skew and hot paths.
3. Explicit documentation of what differs from production and why.
Instrument First, Experiment Second
1. Golden signals per service.
2. Clear SLO-like thresholds and abort conditions.
3. Tracing that survives failure.
4. Tagged test traffic and dependency health metrics.
Make it Safe to Experiment
1. Isolate experimentation from blocking delivery.
2. Run load and chaos testing in its own environment where it is safest to learn. Production if possible, Staging if you must, Experimental Sandbox that is reflective of production load and data density when you can.
Start Small, Expand Deliberately
1. Don’t fear starting with one instance → one cell → one zone → wider slices.
2. Define and seek small blast radii
3. Automated rollback where possible.
4. A human abort lever and guardrails where not.
Make Experiments Boring - Drama is a smell.
1. Pre-scoped.
2. Time-boxed.
3. Repeatable.
4. Logged and auditable.
5. Guard railed.
Treat It as a Product
1. Backlog of hypotheses.
2. Reusable scenarios.
3. Regular cadence.
4. Regression checks to keep lessons learned.
5. Work backwards from the valuable impactful learnings you hope to get

Chaos and load testing are not separate arts. They are complementary lenses through which a system reveals its true character.

Chaos engineering and load testing go together, along with their foundational table stakes of realistic traffic and data distribution patterns, to encourage proactive practice and exploration of the gaps between the system you think you have, the one that looks great on a powerpoint or in a code review, and the system you actually have, the one that cascades towards customer impact the moment you leave for your much-needed annual leave.

Load Testing explores capacity, latency, saturation, and throughput. Answers questions like can we handle expected and pathological demand. Requires modelled workloads, traffic pattern knowledge, data density authenticity.

Chaos Testing explores behaviour under partial failure. Explores questions like when things go wrong, does the system (including us) respond well, and what are our gaps? Supports a proactive practice of safe experimentation to explore your actual resilient behaviours. Requires hypotheses and a level of scientific discipline not often found in software, i.e. a dedication to seeking those gaps through falsification.

Healthy habitats run both, sequentially and then, often, together in the hunt for evidence of gaps between the system we imagine and the system we have.

Some Further Reading

“Why We Still Suck At Resilience” by Adrian Hornsby
“Learning Chaos Engineering” by Russ Miles

A Software Enchiridion

Discussion about this post

Ready for more?