Redis Connection Pooling Is a Correctness Problem, Not a Performance One

February 22, 2026

Most Redis incidents I get paged for are not Redis being slow. They are the client doing something self destructive: opening a connection per request, sizing the pool against QPS instead of concurrency, or leaving connections idle long enough for the server to silently drop them. Connection pooling is not an optimization for Redis. It is a correctness property of any service that talks to it.

Redis is single threaded per shard. A single TCP connection can pipeline thousands of commands per second because the server processes them in order without coordination overhead. Two connections doing 500 commands per second each cost more than one connection doing 1000, because each new connection pays TCP handshake, TLS handshake if you have it, and an AUTH command before the first useful byte. A connection per request turns every read into a multi millisecond setup tax.

Pool size should track peak concurrency, not peak request rate. If your service handles 10,000 QPS but each Redis call returns in 1 millisecond, your peak in flight count is around 10. A pool of 20 covers that with headroom. A pool of 5,000, sized intuitively against the request rate, just buys you 4,980 idle file descriptors that the OS, the Redis server, and your load balancer all have to track.

Per pod pools are correct for almost every architecture. A shared pool across processes adds a coordination layer that defeats the purpose of pooling in the first place. Each pod owns its own pool, sized to its own peak concurrency, with explicit acquire timeouts so a stuck Redis call cannot take down the rest of the service.

The production failure I keep seeing is the idle eviction trap. A team ran a global Redis client with a 5,000 connection pool to "absorb deploy spikes." Most of those connections sat idle for 30 minutes between bursts. Redis has a default TCP keepalive that kills idle connections after a configurable interval, and on the network path, intermediate firewalls drop idle TCP state even faster. The pool kept handing out connections it thought were healthy. The first command on each stale connection failed with a broken pipe. The application retried, sometimes succeeding, sometimes timing out, while the dashboard showed Redis at 5 percent CPU and 1 percent of the connection limit.

The fix had two parts. First, shrink the pool to peak concurrency, around 50. Second, configure the pool's idle timeout to be shorter than the server's keepalive, with testOnBorrow to validate connections before handing them out. Errors went to zero. Redis CPU did not move. The bug was never on the server.

Pool sizing is the cheapest reliability investment your service has.

Key takeaway

Redis pool sizing follows concurrency, not QPS. Small pools with explicit timeouts and idle eviction shorter than the server keepalive are what keep Redis from becoming the source of your incidents.

Originally posted on LinkedIn. View original.