Backpressure and Overload Control: Scale on Leading Signals, Not Lagging Ones

February 28, 2026

A lot of teams ship retries, dead letter queues, and idempotency, then still page themselves at 2am during traffic spikes. The missing piece is backpressure. Retries and DLQs handle correctness, the property that no message is lost. Backpressure handles survival, the property that the system does not amplify its own load while it is already failing.

Without backpressure, the shape of an overload is always the same. Traffic spikes or a downstream slows. Queue depth grows, latency grows, callers time out, callers retry. Retries multiply the load on a system that is already past its capacity, and the only way out is to drop traffic, either deliberately or by crashing.

The mechanisms that actually work in production are not exotic. Bounded queues plus load shedding give you the right to say no. Credit-based flow control, the way gRPC and RSocket do it, lets the consumer tell the producer how much it is willing to accept. Adaptive concurrency limits, like Netflix's concurrency-limits library, run a TCP-style algorithm on your request pool, raising the limit when latency is healthy and shrinking it when latency rises. Little's Law gives you the budget: average concurrency equals throughput times latency, and if either side drifts, the queue will tell you before the user does.

A production failure that taught me the difference between a leading and a lagging signal. A Kafka consumer group fell behind during a routine deploy, the way every consumer group falls behind during a deploy. The team had wired the autoscaler to Kafka consumer lag. Lag spiked, the autoscaler obligingly scaled the consumer fleet to ten times its baseline. Every new consumer opened a connection to the downstream database. The database pool exhausted in under a minute. Consumers crashed on connection errors. Kafka lag grew worse. The autoscaler tried to scale further. The team eventually scaled the consumer fleet manually back to one instance and let the database catch up.

The autoscaler had no backpressure signal from the database. It was scaling on a lagging indicator, consumer lag, while the leading indicator, database p95 latency, was screaming. The fix was a composite scaling policy: scale up only when lag is rising and database latency is healthy. If database latency is rising, do not scale, shed load instead.

Survival is a different optimization problem than throughput. It requires the system to give itself permission to slow down, and that permission has to come from a signal that points forward in time, not backward.

Key takeaway

Retries plus DLQ handle correctness, backpressure handles survival. Under load, autoscaling on a lagging signal like Kafka lag without a backpressure path from the database amplifies failures instead of absorbing them.

Originally posted on LinkedIn. View original.