Hot Keys: When Consistent Hashing Stops Being Fair
March 27, 2026
Distributed caches sell you a comforting story: hash the key, pick a shard, traffic spreads. The math is correct under one assumption that production rarely honors. Real traffic is not uniformly distributed across keys. It is power law distributed. Ninety percent of reads target a single digit percentage of keys, and on a bad day, a single key.
When that happens, consistent hashing keeps doing its job perfectly. The hot key lands on exactly one shard. Every request for it routes there. The other 31 shards in your 32 node cluster sit at single digit CPU while one shard burns. You cannot solve this by adding nodes, because the new nodes get zero percent of the relevant traffic.
There are four mitigations worth knowing, and they compose.
The first is a local L1 cache in front of Redis. Every application process keeps a small in process cache with a short TTL, maybe one second. A hot key now lives in 200 process memories simultaneously, and only one Redis read per pod per second actually hits the shard. Cold keys still go to Redis as before.
The second is key splitting. Take the hot key, append a small random suffix from a fixed set, and write the same value under each variant. Readers pick a suffix at random. One logical key becomes N physical keys spread across N shards. You trade a small write amplification for a clean load distribution.
The third is request coalescing inside each pod. If a hundred coroutines all want the same key at the same instant, only one of them goes to the cache. The others wait and share the result. This is the same single flight idea that defeats stampedes, applied at the read path.
The fourth is hot key replication. The system actively detects keys above a read threshold and copies them to a small set of replica shards. The client library knows about the replicas and picks one at random per read. Consistency stays acceptable because cache values are best effort anyway.
The production failure I keep seeing involves an esports tournament leaderboard. The leaderboard key was on a single Redis shard, normal traffic patterns. A finals match starts, three million spectators hit refresh every two seconds, and that shard pegs at 100 percent CPU. The damage is not just the leaderboard. Session lookups for half the platform happen to be sharded onto the same node. Login latency spikes for users who have no idea a tournament exists.
The lesson is to design as if your traffic is not what your hash function assumes it is.
Consistent hashing distributes keys, not traffic. The mitigation work happens at a different layer: local caches, key splitting, and replication of the few keys that misbehave.
Originally posted on LinkedIn. View original.