Caching & Performance Networking & Load Balancing

TLS Handshake and Session Resumption: Why Your P99 Climbs After Every Deploy

March 6, 2026

The TLS handshake is the part of your request where math runs before any application bytes move. In TLS 1.2 it takes two round trips. ClientHello proposes cipher suites and a random. ServerHello picks one, returns its random, sends the certificate chain, and proposes its key exchange material. The client verifies the chain, sends its own key share, flips to encrypted, and signals ChangeCipherSpec. The server flips, signals back, and only then can the actual HTTP request go out. Two RTTs of latency before a single useful byte.

TLS 1.3 collapses this to a single RTT. The client guesses a key exchange group up front and sends its share inside ClientHello. The server returns its share with ServerHello and the application can start writing immediately after. For known sessions, 0-RTT mode skips even that, encoding the first request alongside the resumption material. The tradeoff with 0-RTT is replay vulnerability, which is why you only use it for idempotent reads.

Resumption itself has two flavors. Session IDs require the server to keep handshake state in memory keyed by ID, which does not scale well across a fleet. Session tickets push that state to the client as an encrypted blob, sealed with a key only the server knows. Any server holding that key can resume. Most modern stacks default to tickets for exactly this reason. Perfect forward secrecy comes from ephemeral key exchange (ECDHE) on each handshake, which is orthogonal to resumption.

The production failure worth knowing: a global API runs TLS 1.2 with session tickets enabled, behind a horizontally scaled LB tier. Each LB instance generates its own ticket encryption key at boot and never shares it. From the client's perspective the resumption ticket looks valid, but the LB it lands on cannot decrypt a ticket sealed by a sibling instance. The handshake silently falls back to full. P99 latency climbs by around 200ms after every deploy, because rolling restarts redistribute clients across fresh LB instances with fresh ticket keys.

The fix is a shared ticket key, rotated on a fixed schedule (commonly daily, with a short overlap window) and distributed via a secret store to every LB. Envoy, NGINX, and HAProxy all support external ticket key files for this exact reason. Upgrading the tier to TLS 1.3 with 0-RTT for safe verbs is the better long-term move, because then even a cold connection costs zero handshake RTTs on the fast path.

If you cannot resume, you are paying the full handshake on every reconnect. Most of the time you do not notice. Until a deploy.

Key takeaway

Session resumption only works if the server that issued the ticket can decrypt it. Horizontally scaled LB tiers with unsynchronized ticket keys silently force a full handshake on every cross-instance reconnect.

Originally posted on LinkedIn. View original.