You've probably been there. A ticket comes in, or an alert fires, or worse...an end user calls. Something broke. And when you start pulling the thread, you find out it wasn't one thing. It was never one thing. It was a Tuesday afternoon change, a config that's been quietly wrong for eight months, a monitoring gap someone meant to fix.
That's not bad luck. That's a safety event. There's a framework for understanding exactly why it happened and more importantly, how to keep it from happening again.
I've been having 1:1's with my director and we had a couple conversations already about this topic. I want to share the ideology behind it because I feel it in my bones throughout my career.
Meet James Reason and His Cheese
Before I was born (no judgement), in 1990, a British psychologist named James Reason published a model of how accidents happen in complex systems. He called it the Swiss Cheese Model, and the name has stuck for over thirty years because it's annoyingly perfect.
The idea is that in any complex system, a nuclear plant, a hospital, an enterprise network, you don't rely on a single safeguard to prevent bad outcomes. You build multiple layers of defense. Policies. Monitoring. Change control. Access controls. Peer review. Each layer is a slice of Swiss cheese. And each slice has holes.
The holes are weaknesses. Some of them are active failures, someone clicked the wrong thing, pushed an untested config, skipped a step. Those are visible, in the moment, and easy to blame. But most holes are latent conditions, weaknesses already baked into the system, and waiting quietly. A firewall rule that's technically wrong but hasn't caused a problem yet or an alert threshold set so high it never fires.
None of those latent conditions are accidents by themselves. The cheese is still mostly solid. But when the holes in multiple slices line up is when every layer of defense has a gap in the same spot. Three holes. Three slices of cheese. They all lined up. This is when a hazard travels all the way through. That's your incident.
The Blame Trap
One thing James Reason was very deliberate about is that the Swiss Cheese Model is explicitly not a blame framework. When a safety event happens, the human who made the active failure, the engineer who pushed the change, the analyst who missed the alert, is almost never the root cause. They're the last domino. The system set them up.
This matters a lot in a post incident review. If your RCA lands on "engineer error" and stops there, you've identified a symptom and called it a root cause. The latent conditions are still there. The next engineer will walk into the same holes.
The useful question isn't "who made the mistake." It's "what conditions made this mistake easy to make and hard to catch."
What This Actually Looks Like in Practice
The Swiss Cheese Model isn't just a diagram for a presentation. It's a mental model you can apply in real time.
When you're doing a change review, you're not just checking if the change is technically correct. You're asking whether this change, in this environment, with these monitoring gaps and this change history, has a reasonable chance of contributing to a future hole-alignment.
When you're writing a post-incident review: you're mapping the layers. What was the hazard? What defenses existed? Where were the holes? Were they active failures or latent conditions? What's the remediation for each layer, not just the one that was most visible?
When you're doing proactive work, rule reviews, config audits, detection testing, you're doing hole patrol (thats weird to say). You're looking at the cheese looking for gaps before something has the chance to travel through them.
The Takeaway
Complex systems fail in complex ways. The safety event you're investigating right now didn't start this week. The conditions for it were probably in place for months. That's not a comfortable thought, but it's a useful one, because it means the work you do today, the rule you clean up, the alert you validate, or the runbook you update is directly reducing the probability of a future alignment.
The cheese will always have holes. Our job is to make sure they never all line up at once.