Circuit Breaker Pattern Deep Dive: The Half-Open State Is Where the Bugs Live

April 7, 2026


A circuit breaker is not a healer. It is a containment device. The dependency on the other side is going to recover on its own schedule, or it is not going to recover at all. The breaker's job is to make sure that while you wait, your service does not exhaust its own threads and connections trying to reach a thing that is not answering.

Three states are worth knowing well. Closed is normal: traffic flows, failures are counted in a sliding window. Open means the breaker has tripped and every call short-circuits to a fallback or a fast error. Half-open is the recovery probe, where the breaker lets a small number of requests through to see if the dependency is back. Most engineers internalize closed and open. Half-open is where the bugs live, because if you let ten requests through and they all hit a still-broken service, you have just paid the full failure cost again.

When to trip matters more than people admit. Consecutive failures is a brittle signal because one bad burst can flip you for the wrong reason. Error rate over a window is better. Latency budget is often the best signal of all: if p95 latency to a dependency crosses a threshold, the breaker opens before timeouts start cascading.

Put a breaker per dependency, not per service. A service that talks to a primary database, two caches, and eight microservices needs eleven breakers. Retries and breakers are complementary, not redundant: the retry policy decides how many attempts each call gets, the breaker decides whether the call even happens.

A production failure mode worth memorizing. One team had a clean breaker on their primary database but nothing on their eight downstream microservices. When one of those services started timing out, the calling service retried three times with no breaker in the way. Every caller across the fleet was doing the same. The retry storm DDoSed the already struggling service into a hard outage, and the entire dependency chain tipped over within four minutes.

The fix was small. Breakers on every downstream call, tripped on rising p95, and a half-open phase that admitted exactly one probe request, not ten. The next time that service hiccuped, callers shed load quietly and the dependency recovered without help.

A breaker does not make the dependency healthy. It keeps the rest of your system honest.

Key takeaway

A circuit breaker does not fix a broken dependency, it stops one slow neighbor from exhausting your threads and connections. Put a breaker on every downstream call, trip on the right signal, and let exactly one probe through in the half-open state.

Originally posted on LinkedIn. View original.


All Rights Reserved.