One Page Load, Fifty Backend Calls: The Tail Latency Math of Synchronous Fan-Out

April 5, 2026


This is a different fan-out problem from the asynchronous kind. The transactional outbox decouples a write from its side effects. Analytics pipelines decouple a click from its downstream consumers. Both buy time. Synchronous read fan-out buys nothing. The user is waiting on the result. Every internal RPC happens before you can respond.

A modern product home page is the canonical example. One inbound request becomes auth, user profile, recommendations, inventory, pricing, reviews, ranking, feature flags, and an experiment assignment. Ten to fifty RPCs is normal. All synchronous. All on the user's clock.

The math is the part most engineers underestimate. If a single dependency has a p99 of 100ms, that means 1% of calls take at least 100ms. If your request fans out to ten dependencies in parallel, the probability that none of them hit their p99 is roughly 0.99 to the tenth, about 90%. So 10% of your requests now experience that 100ms tail. The dependency's p99 has become the request's p90. At fanout fifty, roughly 40% of requests will hit at least one slow dependency. The request's median starts being dominated by the slowest call in the tree.

Sequential fan-out is worse. Latencies add, not max. Ten sequential 20ms calls is a 200ms floor before any spike. The first fix is almost always "stop awaiting these one at a time." If two calls do not depend on each other's results, kick them off together and join later.

Once you are parallel, the other tools come into play.

Hedged requests send a duplicate after a delay threshold and take whichever returns first. Google's "tail at scale" paper is the canonical reference. It works when downstreams have spare capacity. It is dangerous when they do not, because hedging during overload makes overload worse.

Deadline propagation passes the request's remaining budget down the call tree. If the inbound request has 200ms left and the recommendation service takes 180ms by default, the caller tells it "you have 50ms or I do not want the answer." Downstreams that respect deadlines stop doing work nobody is waiting for.

Request coalescing and batching attack the fan-out itself. Do you really need three separate calls to the user service for three IDs, or can it take a batch? Often the fan-out grew by accretion and nobody noticed it could be collapsed.

The failure I have watched live: a recommendation service that called the feature store sixty times per request, once per candidate item. p99 of the feature store was a healthy 40ms. p99 of the recommendation service was over 2 seconds. A single batch endpoint dropped it to 80ms.

The user sees one click. The system sees a tree. Keep the tree shallow, wide, and on a budget.

Key takeaway

Synchronous fan-out turns your dependencies' tail into your request's median. The fixes are parallelism, hedging, deadline propagation, and aggressive coalescing of calls that did not need to be separate.

Originally posted on LinkedIn. View original.


All Rights Reserved.