Synchronized TTL Is a Time Bomb You Lit at Deploy Time
January 9, 2026
There are several failure modes that get grouped under the label "cache stampede," and they require different fixes. Single-flight protects you when many requests miss the same key at the same instant. CDN fan-out protects you when a viral asset thunders past a single layer. This post is about a third one that is sneakier than either: fleet-wide synchronized expiry.
The setup is innocuous. Your service warms its cache with a thousand entries at startup. Every entry gets a TTL of one hour. An hour later, those thousand entries do not expire gradually. They expire at the same wall-clock second, because the cache was populated in a loop that finished in under a second. The clients that depend on those keys all miss together. The origin sees a wall of traffic, not a curve.
This is structurally different from a hot key incident. A hot key concentrates load on a single shard. A synchronized expiry concentrates load on a single instant. Adding replicas does nothing. Adding shards does nothing. The bug is in the time domain.
The fixes are cheap and they compose. TTL jitter, plus or minus 20 percent applied at write time, smears the expiry cliff into a curve. Refresh-ahead workers rebuild popular keys in the background just before they expire, so the read path never misses. A two-layer cache, with a warm replica feeding a primary, gives the origin a buffer when both layers happen to miss together. None of these are exotic. They are just the difference between a service that survives its own deploys and one that does not.
The production failure I keep coming back to involved a CMS that warmed its Redis cache at deploy time. Pages had a TTL of exactly 3600 seconds. The team deployed every Friday at 2 pm. At 3 pm on Friday, every cached page expired in a five-second window, and the origin servers, which were sized for steady-state traffic, tipped over. The on-call would scale the origin pool, the cache would refill, and the world would look fine again by 3:15. This happened for six weeks before someone overlaid the outage graph on the deploy log and noticed the period was exactly one hour. The fix was a single line: jitter the TTL by plus or minus 720 seconds at write time. The outages stopped immediately.
The lesson is that any cache built by a synchronous warmup loop is a fleet-wide alarm clock that goes off in unison. Jitter is not an optimization. It is the price of admission for any TTL-driven cache that handles real load.
Spread the expirations or the origin will spread your weekend.
Cache stampedes from a single hot key are one problem. Fleet-wide synchronized expiry is another, and it is the one that correlates with your deploy schedule.
Originally posted on LinkedIn. View original.