Backpressure: How Healthy Systems Push Back Before Their Queues Eat Them
April 5, 2026
Systems almost never fail because they are busy. They fail because they keep saying yes to work they cannot finish.
That is the failure mode backpressure exists to prevent. Imagine requests arriving at a service faster than the worker pool can drain them. At first, nothing looks wrong. The queue absorbs the excess. Then the queue grows. Memory creeps up. p99 latency drifts past the timeout budget of the upstream caller. The upstream caller retries, which doubles the inbound rate. Workers are pinned at 100% CPU but make no forward progress because every task they finish is already past its deadline and the result is discarded.
The danger is not the spike. The danger is unbounded work in flight.
Backpressure is the system's ability to tell the producer to slow down before the queue eats it. It can take several shapes:
- Slow the producer by holding off on acks (TCP does this).
- Stop reading from the upstream socket temporarily.
- Shed load by dropping low-priority work.
- Return 429s or 503s early so callers fail fast and route elsewhere.
All of these feel adversarial. None of them are. They are the difference between a service that degrades gracefully and one that drags every dependency down with it.
The mental model I use: every queue has a finite useful depth, and that depth is much smaller than its memory limit. A queue twice as long as your worker pool can drain in your timeout budget is already too long, because anything beyond that point will time out before it is processed. Accepting it is worse than rejecting it. You spent the CPU and produced no result.
Concrete failure I watched. A payment ingestion service had unbounded internal queues and no admission control. Stripe had a regional issue and our processing time tripled. Instead of returning 429s at the edge, we kept accepting webhooks. The queue grew to two million entries. JVM heap went red. Garbage collection pauses crossed thirty seconds. Health checks failed. Kubernetes restarted the pod. Two million in-memory entries vanished. We lost three hours of payment events because the system tried to be polite.
The fix was a bounded queue and a token bucket at the ingress. The next outage looked like a clean rejection spike on the dashboard. Customers retried. Nothing got lost. The service stayed up.
Better to reject some requests quickly than to let every request fail slowly. Backpressure is how a system tells the truth about its capacity before the truth becomes an outage.
Without backpressure, the queue becomes your outage. The goal is not to avoid rejection, it is to reject quickly instead of letting every request fail slowly.
Originally posted on LinkedIn. View original.