Traffic Spikes Are Not a Capacity Problem
December 23, 2025
The reflex when someone says "we are getting hit with traffic" is to add more servers. It is the wrong reflex for spikes, because the bottleneck is almost never the number of application instances. The bottleneck is something shared and finite that you cannot autoscale in 30 seconds: the database connection pool, the primary's write throughput, a third-party API quota, an authentication service, a single hot Redis key.
Consider a flash sale. Traffic goes from 2 thousand requests per second to 80 thousand in 90 seconds. Your autoscaler kicks in. New nodes spend three minutes booting, pulling images, warming JIT caches, and opening connection pools. By the time they are healthy the spike is already over. Meanwhile each existing node tries to open more database connections to handle the surge. The database hits its max_connections ceiling, queries queue, threads block on connection acquisition, and the load balancer returns 503 because every backend's /healthz is timing out on its own database check. More servers would have made this worse, because each one adds more pressure to the same finite database.
The right shape is layered. Six patterns I keep reaching for.
Front-door rate limiting. Cap the arrival rate at the edge per user, per IP, per API key. The goal is survival, not fairness. The system needs an envelope it can stay inside while autoscaling catches up. Token buckets at the gateway work fine.
Circuit breakers. When a downstream dependency starts failing, stop calling it. Fail fast, return a cached or degraded response, and let it recover. Without breakers, a slow downstream turns into thread-pool exhaustion at every caller, and the failure propagates until the whole system is frozen on waits.
Async paths. Anything that does not need to happen inside the request: email, notifications, analytics, image processing, reward credits, search index updates. Push them onto a queue. The request returns in 30 milliseconds, the work happens in 30 seconds, and the user does not care.
Distributed cache. Cut the database load by a factor of 10 to 100 for read-heavy paths. Sessions, product details, configuration, anything with high read-to-write ratio. The cache absorbs the spike the database cannot.
Sharding. Once a single database is the bottleneck, partition. This is the heavyweight fix and takes weeks to do well, but it is the only thing that scales writes past one machine.
Peak shaving with a queue. When the burst is unavoidable, do not serve it in real time. Accept the request, write to a durable queue, return a confirmation, and let consumers process at steady-state rate. The user sees "your order is being processed" instead of an error.
The pattern is: cap the inflow, absorb the burst, defer the optional work, and protect the shared resources from the surge. None of those is "more servers."
Adding servers does not protect you from a spike, because the spike usually breaks a shared resource the new servers also depend on. Front-door rate limits cap the arrival rate, async queues absorb the burst, circuit breakers prevent cascades, and load leveling lets a slow backend stay alive through a fast surge.
Originally posted on LinkedIn. View original.