Cache TTL Tuning: The Knob That Sits Between Hit Rate and Staleness
March 7, 2026
TTL looks like a number you guess at. It is actually a choice about which kind of failure you would rather absorb. Too short, and your cache misses on every request that arrived a second too late, sending traffic past your fastest layer into your slowest one. Too long, and users see prices, profiles, and feature flags that lived through three deploys ago. Neither default value is correct for every key in your service. The honest answer is per resource.
Start with the data, not the cache. A list of country codes changes once a year, so a TTL measured in hours costs you nothing. A personalized feed changes with every user action, so a TTL of even one minute can serve confusing stale state to the user who just took the action. A stock price might not be cacheable at all. Treat TTL as a property of the resource, not a global service setting.
The second knob is jitter. If every key in a class expires at exactly the same moment, you get a synchronized miss storm against the origin, sometimes called a cache cliff. The fix is randomization. Instead of TTL equals 60 seconds, set TTL equals 60 seconds plus or minus 10 percent per key. Expirations spread across a 12 second window, and the origin sees a smooth refresh rate instead of a sawtooth. The change is one line of code and prevents an entire class of incident.
The third knob is the invalidation policy. When the cost of staleness is high, do not rely on TTL alone. Write through invalidation deletes or rewrites the cache entry whenever the source of truth changes, and TTL becomes a backstop in case the invalidation message gets lost. The combination is stronger than either alone.
The production failure I keep seeing involves the top of the minute. A team set a flat 60 second TTL on a high traffic product API. User correlated cache entries, written when users logged in around the same time, all expired in the same second every minute. The origin database saw a 10 thousand QPS spike at 0 seconds past the minute, idled at 200 QPS for the next 58 seconds, then spiked again. The autoscaler chased the sawtooth and never settled. The fix was a one line change: TTL equals 60 seconds with plus or minus 10 percent randomization per key.
A cache is not just where data lives. It is when data leaves.
TTL is a tradeoff knob between hit rate and staleness. Pick it per resource, jitter it to break synchronized expiry, and treat it as a safety net when write through invalidation is available.
Originally posted on LinkedIn. View original.