Tail Latency and Hedged Requests: Why p50 Looks Fine While Users Complain

March 3, 2026


The dashboard says average latency is 80 ms. Support says the app feels randomly slow for some users. Both are true and the gap between them is almost always the tail.

Here is the shape of the problem. An aggregator service fans out to replicas A, B, and C in parallel and waits for all three before responding. A and B come back in 40 ms. C is in the middle of a young-gen GC pause, or queueing behind a slow neighbor, or running on a node that just got a CPU steal spike from a noisy VM next door. C takes 600 ms. The user-visible latency of this request is 600 ms because the response combiner is blocked on C. Your p50 of the composite is still fine, because most requests are not unlucky. But p99 is the experience of the unlucky users, and they are real.

This is fan-out amplification. If each replica has a p99 of 200 ms and you fan out to three, the probability that at least one leg lands in its slow 1% is close to 3%. Fan out to ten and the composite tail becomes a third of all requests. The tail of a single dependency becomes near the median of the composite call.

Hedged requests are the standard mitigation. After a short delay, often around the single-replica p95, send a duplicate to a different replica and take whichever returns first. Cancel the loser. For an idempotent read against a system whose tails are caused by per-request bad luck (GC pauses, transient queueing, hot CPU), hedging cuts the composite p99 dramatically because you are routing around one unlucky moment, not around a broken system.

The trap is that hedging can finish off a partial outage. If replica C is slow because the underlying database is degraded, not because of a per-request GC pause, then hedging does not avoid the slow replica. It just doubles or triples the load on the same degraded database that was already brownout. Latency on every replica climbs. The hedge threshold trips on more requests. The hedge ratio approaches 100%. Now the database is taking three times its normal load while it was already failing. Brownout becomes outage.

The safe shape: only hedge when the cluster is healthy. Tie the hedge to a circuit breaker that opens when the dependency's overall error rate or queue depth crosses a threshold. Cap the hedged fraction at something small, often 5%. Hedge only idempotent reads. Combine with timeouts and backpressure so a runaway hedge cannot consume the budget of every other call.

Engineer the tail, not the average. And do it without amplifying the failure you were trying to hide.

Key takeaway

A fan-out aggregator inherits the slowest replica's latency, so the user feels p99 even when p50 is clean. Hedged requests cancel the slow one, but only safely when the cluster is healthy. During a brownout, hedging is how a degraded system becomes a dead one.

Originally posted on LinkedIn. View original.


All Rights Reserved.