Caching & Performance Networking & Load Balancing

The TCP Handshake and Teardown: Where Your Sockets Actually Go to Die

March 30, 2026

A TCP connection is not a session. It is a state machine that two kernels are running in parallel, and the handshake is the part where they agree on initial conditions.

Setup is three packets. The client sends SYN with its initial sequence number x. The server replies with SYN-ACK, acknowledging x+1 and proposing its own sequence number y. The client sends ACK with y+1. That third packet finalizes the agreement. From here on, every byte is numbered, and out-of-order or duplicate data can be detected.

The handshake exists to synchronize sequence numbers, not just to confirm reachability. That is the part most people forget. Without SYN carrying an initial sequence number, retransmits and reordering would be ambiguous.

Teardown takes four packets because each direction closes independently. The active closer sends FIN. The peer sends ACK, then enters CLOSE_WAIT while it drains whatever it still wants to send. When the peer is done, it sends its own FIN, and the original closer sends a final ACK. The original closer then sits in TIME_WAIT for 2 * MSL, around 60 seconds on Linux, before releasing the socket.

TIME_WAIT is not paranoia. It exists so that delayed packets from this connection cannot be misinterpreted as belonging to a new connection that happens to reuse the same four-tuple. The kernel is protecting you from a real bug.

The production failure that catches people: a Java service running a load test against a downstream API, opening a fresh connection per request, SO_REUSEADDR off. Every closed connection leaves a TIME_WAIT entry pinned to a unique source port. The ephemeral port range on Linux is typically 32768 to 60999, about 28000 ports. At a few thousand requests per second, you exhaust the range in under a minute. New connections start failing with EADDRNOTAVAIL. The dashboard shows the downstream API as down. It is not. Your own kernel is refusing to give you a source port.

The fix is rarely "open the port range wider." That just delays the wall. The real fix is connection reuse: an HTTP client with a proper keep-alive pool, a gRPC channel held for the worker's lifetime, a pgbouncer in front of Postgres. Reuse the four-tuple instead of burning a fresh one per request. net.ipv4.tcp_tw_reuse is a reasonable secondary lever, but only after you stop opening fresh connections on the hot path.

The same lesson shows up server-side. A daemon that crashes without calling close leaves the peer accumulating CLOSE_WAIT. If monitoring sees CLOSE_WAIT climbing while ESTABLISHED stays flat, the app is not draining its half of the teardown. That is an app bug, not a network one.

Once you have seen this, you stop trusting that "connection refused" means the other side is broken. Half the time the broken side is the one you wrote.

Key takeaway

TIME_WAIT is not a bug. It is the kernel protecting you from delayed duplicate packets. But if you keep opening fresh connections, it will eat your ephemeral port range and look exactly like a network outage.

Originally posted on LinkedIn. View original.