Circuit Breaker Pattern: Stop Cascading Failures Cold
February 27, 2026
The circuit breaker pattern exists to answer a specific question: when a downstream service is failing, what should your service do about it?
The naive answer is "keep trying." That answer kills you. Every retry adds load to a service that is already struggling. Your threads pile up waiting on slow calls. The thread pool fills. Now your service is unresponsive too, and the failure has spread upstream. This is the cascading failure pattern, and it is how one slow database brings down an entire microservice graph in fifteen minutes.
The circuit breaker says: stop retrying once the failure rate crosses a threshold. Reject calls immediately. Give the downstream service room to recover.
It is a state machine with three states.
Closed. Normal operation. Requests flow through. The breaker tracks the success and failure rate over a sliding window. As long as the failure rate stays below the threshold, nothing changes.
Open. The failure rate exceeded the threshold. The breaker trips. Now every incoming call is rejected immediately with a fast error, without even attempting the downstream call. Your service stops adding load. Your threads stop piling up. Your callers get a clear failure signal they can fall back on.
Half-Open. After a configurable timeout, the breaker lets a small number of probe requests through. If those succeed, the downstream service is back, and the breaker closes again. If they fail, it returns to Open and starts the timeout over.
The values that matter:
- The failure threshold (often a rate, like "more than 50% of the last 100 calls failed").
- The minimum sample size, so a single bad request does not trip the breaker.
- The Open timeout, which controls how long you stay open before probing again.
- The probe count in Half-Open, which controls how confident you need to be that the downstream is healthy.
A common mistake is to set the threshold too low. The breaker trips during a transient spike, locks itself open for thirty seconds, and now your service is degraded by a problem that would have resolved on its own in two seconds. Another common mistake is to forget the minimum sample size, so the breaker trips on a single failed health check before any real traffic has flowed.
The mental model is a household electrical breaker. It is not there to make every load work. It is there to fail loud and fast when something is wrong, so the rest of the house keeps power. Your circuit breaker is doing the same job for your service graph.
If you have ever seen a single bad database take down a microservice fleet, this is the pattern that would have stopped it.
The circuit breaker turns slow cascading failures into fast, predictable rejections. Closed monitors, Open rejects, Half-Open probes for recovery.
Originally posted on LinkedIn. View original.