System Design Fundamentals
Storage & Data Modeling
Partitioning, Replication & Consistency
Caching & Edge
Messaging & Streaming
Reliability & Operability
Security & Privacy
DNS, TLS, HTTP, Connection Pooling
DNS is the Internet's phonebook — it translates a
human-readable name like example.com into the IP address
93.184.216.34 that computers use for routing. But DNS is not
just a simple lookup service. It is a globally distributed,
hierarchical caching system that handles trillions of queries
per day. Understanding how DNS works — and where it fails — is
essential for reasoning about latency, availability, and
failover in system design interviews.

The Resolution Chain
When you type www.example.com into your browser, the request
walks a hierarchy of servers. This chain is designed so that no
single server needs to know the answer for every domain — each
server knows just enough to point you to the next step.
Step 1 — Local caches. The browser checks its own DNS
cache first. Chrome, Firefox, and Safari all maintain separate
in-process DNS caches with short TTLs (typically 60 seconds).
If the browser has no entry, the OS resolver cache is checked
next (managed by systemd-resolved on Linux or the DNS Client
service on Windows). If the OS has no entry either, the query
goes to the network.
Step 2 — Recursive resolver. This is the workhorse of DNS. Typically run by your ISP, corporate network, or a public service like Cloudflare (1.1.1.1) or Google (8.8.8.8). The recursive resolver does all the heavy lifting — if it does not already have the answer cached, it walks the DNS hierarchy on your behalf. A busy recursive resolver can cache answers for thousands of domains, meaning most queries never leave this step.
Step 3 — Root nameservers. If the recursive resolver has
no cached answer, it starts at the top of the hierarchy. There
are 13 root nameserver clusters (named a.root-servers.net
through m.root-servers.net), operated by organizations like
ICANN, Verisign, and the US Army. They are distributed across
hundreds of physical locations via anycast. The root server
does not know the IP for www.example.com, but it knows which
servers are responsible for the .com top-level domain and
returns a referral to those TLD servers.
Step 4 — TLD nameservers. The TLD server for .com
(operated by Verisign) knows every domain registered under
.com. It does not know the IP for www.example.com, but it
knows that example.com's DNS is managed by specific
authoritative nameservers (like ns1.example.com) and returns
a referral to those servers.
Step 5 — Authoritative nameserver. This server has the
final answer. It holds the actual DNS records for
example.com and responds with the IP address. The recursive
resolver caches this answer (respecting the record's TTL),
then returns the IP to the browser.
The full chain — browser, OS, recursive resolver, root, TLD,
authoritative — typically takes 50-200ms on a cold cache. But
once cached, the answer returns in under 1ms from the local
resolver. In practice, the root and TLD steps are almost always
cached at the recursive resolver because every .com lookup
refreshes the TLD referral. The authoritative lookup is the
step most likely to actually hit the network.
DNS Record Types
DNS stores more than just IP addresses. Each record type serves a specific purpose in the ecosystem:
- A record — maps a domain to an IPv4 address.
example.com → 93.184.216.34. This is the most common record type and what most people think of when they think about DNS. - AAAA record — maps a domain to an IPv6 address.
example.com → 2606:2800:220:1:248:1893:25c8:1946. Identical to an A record but for IPv6. Both can coexist on the same domain for dual-stack connectivity. - CNAME record — an alias that points one domain name to another.
www.example.com → cdn.provider.com. The resolver must then look up the target domain to get the actual IP. CNAMEs add an extra resolution step, which adds latency. Importantly, a CNAME cannot coexist with other record types at the same name, which is why you cannot use a CNAME at the zone apex (bare domain likeexample.com). Some DNS providers offer proprietary alternatives (ALIAS, ANAME) that resolve this limitation. - MX record — specifies mail servers for a domain.
example.com MX 10 mail.example.com. The priority number (10) determines which server to try first when multiple MX records exist. Lower priority numbers are preferred. - NS record — delegates a domain to specific nameservers.
example.com NS ns1.example.com. This is how the TLD server knows which authoritative servers to refer you to. - TXT record — holds arbitrary text. Used for SPF (email sender verification), DKIM (email signing), domain ownership verification (for Google, AWS, etc.), and other metadata.
- SRV record — specifies host and port for a service.
_sip._tcp.example.com SRV 10 5 5060 sip.example.com. Used by protocols like SIP, XMPP, and service discovery in Kubernetes. Includes priority, weight, port, and target.
TTL and Caching Strategy
Every DNS record has a TTL (Time To Live) value in seconds that controls how long resolvers cache the answer. When the TTL expires, the resolver must query the authoritative server again. TTL is a fundamental tradeoff:
Long TTL (hours to days) — reduces DNS query volume, lowers authoritative server load, and speeds up resolution for repeat visitors. But changes propagate slowly. If you update an A record from IP-old to IP-new with a 24-hour TTL, some users will still resolve to IP-old for up to 24 hours.
Short TTL (seconds to minutes) — changes propagate quickly, enabling fast failover and migration. But every request beyond the TTL triggers a new DNS lookup, increasing latency and load on authoritative servers. A 30-second TTL means your authoritative server handles roughly 2,800 queries per day per caching resolver, versus just 1 query per day with a 24-hour TTL.
The practical approach: Use long TTLs (1-24 hours) for stable records. Before a planned change, lower the TTL to 30-60 seconds a few hours in advance so all caches pick up the short TTL. Make the change. After the new record is stable, raise the TTL back. This is the standard playbook for DNS migrations and failovers.
DNS-Based Load Balancing and Failover
DNS can distribute traffic across multiple servers by returning different IP addresses for the same domain:
Round-robin — the authoritative server returns multiple A records in rotating order. Each resolver gets a different first IP. Simple but has no health checking — if a server dies, DNS keeps returning its IP until the record is manually updated.
Weighted routing — services like AWS Route 53 and Cloudflare allow assigning weights to records. A server with weight 70 gets 70% of queries, weight 30 gets 30%. Useful for gradual traffic shifting during deployments.
Geolocation routing — returns different IPs based on the resolver's geographic location. A user in Tokyo resolves to an Asia-Pacific server, a user in London resolves to a European server. Reduces latency by routing to the nearest data center.
Health checks — managed DNS services can monitor server health and automatically remove failed servers from the response. Route 53 checks endpoints every 10-30 seconds and stops returning IPs for unhealthy servers. This solves the biggest weakness of basic DNS round-robin.
When DNS Goes Wrong
Propagation delays — after changing a DNS record, cached copies of the old record persist until their TTL expires. During this window, some users resolve to the old IP and some to the new IP. There is no way to forcibly invalidate DNS caches across the Internet. This is why TTL management before changes is critical.
Cache poisoning — an attacker injects forged DNS responses into a resolver's cache, redirecting users to a malicious IP. The Kaminsky attack (2008) exploited predictable transaction IDs to poison caches at scale. Mitigations include randomized source ports, randomized query IDs, and DNSSEC.
DNS amplification attacks — attackers send small DNS queries with a spoofed source IP (the victim's IP), and the DNS server sends large responses to the victim. DNS responses can be 50-100x larger than queries, making this an effective DDoS amplification vector.
DNS has no built-in encryption or authentication. Standard DNS queries travel in plaintext over UDP port 53, visible to anyone on the network path. DNSSEC adds authentication (proving responses came from the real authoritative server) but not encryption. DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) add encryption, preventing eavesdropping on DNS queries. Modern browsers increasingly default to DoH, routing DNS through Cloudflare or Google rather than the ISP resolver.
TLS is the cryptographic protocol that turns HTTP into HTTPS. It provides three guarantees for data in transit: confidentiality (encryption prevents eavesdropping), integrity (tampering is detected), and authentication (the server proves its identity via a certificate). Without TLS, every byte between client and server travels in plaintext, visible to anyone on the network path — ISPs, Wi-Fi hotspot operators, corporate proxies, and attackers performing man-in-the-middle interception.

What TLS Protects Against
Understanding the threat model clarifies why each part of TLS exists:
- Eavesdropping — without encryption, anyone on the network path can read the data. On public Wi-Fi, this means another user with a packet sniffer can see every URL, form submission, and cookie. TLS encryption makes the data unreadable to anyone except the intended recipient.
- Tampering — an attacker could modify data in transit. For example, injecting ads into HTTP pages (common with ISPs), or altering a bank transfer amount. TLS integrity checks (HMAC on every record) detect any modification and abort the connection.
- Impersonation — without authentication, a DNS hijack or BGP hijack could redirect traffic to a fake server. TLS certificates, signed by trusted Certificate Authorities (CAs), prove that the server you connected to is actually the one that controls the domain. If the certificate does not match, the browser shows a warning and blocks the connection.
TLS 1.3 Handshake: Step by Step
TLS 1.3 (finalized in RFC 8446, 2018) completes the handshake in a single round trip (1-RTT), down from two round trips in TLS 1.2. Here is how it works:
1. ClientHello. The client sends: supported cipher
suites (like TLS_AES_256_GCM_SHA384), supported key
exchange groups (like x25519 or secp256r1), and
crucially, a key share — the client's half of a
Diffie-Hellman key exchange, computed speculatively for its
preferred group. In TLS 1.2, the client had to wait for the
server to choose a group before sending its key material.
Sending it upfront is what saves the extra round trip.
2. ServerHello. The server responds with: the chosen cipher suite, its own key share (its half of the Diffie-Hellman exchange), and an encrypted extensions block. At this point, both sides have enough information to derive the shared secret.
3. Server authentication. Still in the same response flight, the server sends its certificate chain and a CertificateVerify message — a digital signature over the handshake transcript, proving it holds the private key for the certificate. The server also sends an encrypted Finished message containing a MAC over the entire handshake, which the client verifies to confirm nothing was tampered with.
4. Client Finished. The client verifies the server's certificate chain (checking the CA signature, expiration, domain name match, and revocation status), verifies the CertificateVerify signature, and verifies the Finished MAC. If everything checks out, the client sends its own encrypted Finished message. The handshake is complete.
5. Application data. Both sides now use the derived session keys for symmetric encryption (AES-GCM or ChaCha20-Poly1305). All subsequent data is encrypted and authenticated.
The total cost is one round trip — the client sends ClientHello, waits for the server's response, then can immediately send encrypted application data. On a 50ms network link, this saves 50ms compared to TLS 1.2's two-round-trip handshake.
Key Exchange: Ephemeral Diffie-Hellman
TLS 1.3 exclusively uses ephemeral Diffie-Hellman (DHE) key exchange. Each connection generates a fresh key pair, and the shared secret is derived from combining the client's and server's ephemeral public keys.
The critical property is forward secrecy: even if the server's long-term private key is compromised later, past sessions cannot be decrypted. Because each session used a unique ephemeral key that was discarded after the session ended, there is no stored key material to decrypt old traffic.
TLS 1.2 allowed static RSA key exchange, where the client encrypted the premaster secret with the server's long-term RSA public key. If that RSA key was later compromised, every past session could be decrypted. TLS 1.3 removed this entirely — forward secrecy is mandatory.
Certificate Chain of Trust
A TLS certificate is a signed document that binds a domain name to a public key. The trust model works as follows:
Root CAs — a small set of trusted Certificate Authorities (like DigiCert, Let's Encrypt, Google Trust Services) whose root certificates are pre-installed in operating systems and browsers. There are roughly 100-150 root CAs trusted by major browsers.
Intermediate CAs — root CAs issue certificates to intermediate CAs, which in turn issue the leaf certificates that websites use. This layered structure protects the root key: if an intermediate is compromised, only its issued certificates need revocation, not every certificate globally.
Leaf certificates — the certificate presented by the server. Contains the domain name(s) in the Subject Alternative Name (SAN) field, the public key, the issuer (intermediate CA), validity period, and a digital signature from the intermediate CA.
Verification — the client walks the chain from leaf to intermediate to root, verifying each signature and checking that the root CA is in its trust store. It also checks the domain name matches, the certificate has not expired, and the certificate has not been revoked (via OCSP or CRL).
What Changed from TLS 1.2 to 1.3
TLS 1.3 was a major cleanup of the protocol:
- Removed insecure algorithms — static RSA key exchange (no forward secrecy), RC4, DES, 3DES, MD5, SHA-1, and arbitrary Diffie-Hellman groups. Only a handful of strong cipher suites remain.
- Encrypted handshake — in TLS 1.2, the server's certificate was sent in plaintext during the handshake, revealing which website the client was connecting to. TLS 1.3 encrypts the certificate, improving privacy.
- Simplified state machine — TLS 1.2 had dozens of possible handshake flows with various extensions and fallbacks. TLS 1.3 has essentially one flow, making implementation simpler and less prone to bugs.
- 1-RTT handshake — as described above, eliminates one round trip by having the client send its key share speculatively in the ClientHello.
0-RTT Resumption
For repeat connections to the same server, TLS 1.3 supports 0-RTT (zero round trip time) resumption. The client uses a pre-shared key (PSK) from a previous session to encrypt application data and sends it in the very first message, before the handshake completes. The server can process this data immediately.
The tradeoff is a replay vulnerability: an attacker who
captures the initial 0-RTT message can resend it. The server
might process the same request twice. This makes 0-RTT
unsafe for non-idempotent operations (payments, database
writes, state changes). Safe uses include GET requests for
static content where replaying has no side effects.
Mutual TLS (mTLS)
Standard TLS is one-sided: only the server presents a certificate. In mutual TLS, the client also presents a certificate, and the server verifies it. Both sides authenticate each other.
mTLS is widely used in microservice architectures and zero-trust networks. Service meshes like Istio and Linkerd automatically manage client certificates for every service, ensuring that only authorized services can communicate. This replaces API keys or tokens for service-to-service authentication with cryptographic identity verification.
TLS 1.3 reduced the handshake from 2-RTT to 1-RTT, but the connection still requires a TCP handshake first (1-RTT). So a new HTTPS connection takes 2 round trips total: one for TCP SYN/SYN-ACK, one for TLS. On a 100ms link, that is 200ms before the first byte of application data. This is why connection pooling and HTTP/2 multiplexing matter — they amortize this setup cost across many requests.
HTTP is the application protocol that carries web content. Its evolution from HTTP/1.1 through HTTP/2 to HTTP/3 is a story of eliminating bottlenecks: first application-layer head-of-line blocking, then transport-layer head-of-line blocking. Each version solves a real performance problem that the previous version could not address.

HTTP/1.1: One Request at a Time
HTTP/1.1 (1997) uses a simple request-response model: the client sends a request, waits for the complete response, then sends the next request. This is head-of-line (HOL) blocking at the application layer — a slow response blocks all subsequent requests on that connection.
To work around this, browsers open 6 parallel TCP connections per origin (the browser-imposed limit). A page loading 30 resources uses all 6 connections, with each connection processing requests sequentially. This helps but creates its own problems: 6 TCP handshakes, 6 TLS handshakes, 6 congestion control windows competing for bandwidth, and 6 sets of buffers consuming server memory.
HTTP/1.1 has a feature called pipelining that allows sending multiple requests without waiting for responses, but responses must still arrive in order. If the first request takes 5 seconds and the second takes 50ms, the client waits 5 seconds for both. Pipelining was poorly implemented and most browsers disabled it.
Other HTTP/1.1 limitations:
- Textual headers — headers are sent as ASCII text with no compression. A typical request sends 200-800 bytes of headers. Since many headers (Host, Accept, Cookie, User-Agent) are identical across requests to the same server, this is highly redundant.
- No server push — the server can only respond to requests. It cannot proactively send resources it knows the client will need (like CSS referenced in the HTML). The client must parse the HTML, discover the CSS reference, then send a separate request.
- Domain sharding — to bypass the 6-connection limit, developers split resources across multiple subdomains (img1.example.com, img2.example.com). This adds DNS lookups and TLS handshakes for each shard. HTTP/2 made this antipattern unnecessary.
HTTP/2: Multiplexed Streams
HTTP/2 (2015, RFC 7540) solves application-layer HOL blocking by multiplexing multiple streams over a single TCP connection. Each request-response pair is an independent stream identified by a stream ID. Frames from different streams are interleaved on the wire, so a slow response on stream 3 does not block streams 1 and 2.
Key concepts in HTTP/2:
Streams — each request creates a new stream. Streams are lightweight (just an ID and state) and can be opened and closed independently. A single connection can handle hundreds of concurrent streams.
Frames — the basic unit of communication. Each frame has a stream ID, a type (HEADERS, DATA, PRIORITY, RST_STREAM, etc.), and a payload. Frames from different streams are interleaved on the same connection, enabling true multiplexing.
Binary framing — unlike HTTP/1.1's textual protocol, HTTP/2 uses a binary framing layer. Each frame has a fixed 9-byte header (length, type, flags, stream ID) followed by the payload. Binary parsing is faster and less error-prone than parsing text delimiters.
HPACK header compression — HTTP/2 compresses headers using a dynamic table that both client and server maintain. Headers sent in previous requests are stored in the table. On subsequent requests, a header that matches a table entry is sent as a single index number (1-2 bytes) instead of the full string (50-200 bytes). Since most headers are identical across requests to the same origin, HPACK typically achieves 85-95% compression ratios on headers.
Stream prioritization — clients can assign priorities and dependencies to streams (e.g., CSS stream should complete before image streams). Servers use this information to allocate bandwidth to the most important resources first. Implementation varies across servers; some honor priorities well, others ignore them.
Server push — the server can proactively send resources the client has not yet requested. When serving an HTML page, the server can push the CSS and JavaScript files it knows the page references. The client receives them before it even parses the HTML and discovers the references. In practice, server push has been difficult to use effectively and Chrome removed support for it in 2022, though it remains in the HTTP/2 specification.
HTTP/3 and QUIC: Eliminating TCP HOL Blocking
HTTP/2 solved application-layer HOL blocking but introduced a new problem at the transport layer. Because HTTP/2 multiplexes all streams over a single TCP connection, a single lost TCP packet blocks ALL streams until that packet is retransmitted. TCP guarantees in-order delivery of the byte stream, and it cannot deliver bytes past a gap even if those bytes belong to a completely different HTTP/2 stream.
On a connection carrying 10 streams, a lost packet in stream 5's data stalls streams 1-4 and 6-10 even though their data is already in the receive buffer. On lossy networks (mobile, Wi-Fi), this TCP-layer HOL blocking can negate much of HTTP/2's multiplexing benefit.
HTTP/3 (2022, RFC 9114) replaces TCP entirely with QUIC (RFC 9000), a transport protocol built on UDP:
- Independent streams — QUIC provides multiplexed streams at the transport layer, each with its own independent loss recovery. A lost packet in stream 5 only stalls stream 5. Other streams continue receiving data because QUIC does not enforce cross-stream ordering.
- Integrated TLS 1.3 — QUIC incorporates TLS 1.3 directly into the transport handshake. A new QUIC connection completes in 1-RTT (combined transport and crypto handshake), the same as TCP + TLS 1.3 separately, but with less overhead. Resumed connections can use 0-RTT.
- Connection migration — QUIC connections are identified by a connection ID, not by the (IP, port) tuple. When a mobile device switches from Wi-Fi to cellular, the QUIC connection survives because the connection ID does not change. TCP connections would break and require a new handshake.
- Userspace implementation — QUIC runs in userspace (not the kernel), enabling faster iteration and deployment. New features can be deployed with an application update rather than an OS kernel update.
The evolution HTTP/1.1 to HTTP/2 to HTTP/3 follows a pattern of pushing blocking problems down one layer and then solving them there. HTTP/1.1 had application-layer HOL blocking (one request at a time per connection). HTTP/2 solved that with multiplexed streams, but exposed TCP-layer HOL blocking (one lost packet stalls all streams). HTTP/3 solved that by replacing TCP with QUIC (independent stream loss recovery). Each version is a direct response to the bottleneck the previous version could not address.
Every TCP connection begins with a three-way handshake (SYN, SYN-ACK, ACK) — one round trip. If the connection uses TLS, add another round trip for the TLS handshake. That is 2 round trips and 100-300ms of latency before a single byte of application data is exchanged. For a service making hundreds of database queries or API calls per second, establishing a new connection for every request wastes an enormous amount of time and resources.
Connection pooling solves this by maintaining a pool of pre-established, reusable connections. Instead of opening and closing a connection for each request, the application borrows a connection from the pool, uses it, and returns it when done. The connection stays open for future requests, amortizing the handshake cost across hundreds or thousands of operations.
The Cost of New Connections
To understand why pooling matters, consider the full cost of a new database connection:
- TCP handshake — 1 round trip (50-100ms to a remote database).
- TLS handshake — 1 additional round trip if the database uses TLS (increasingly common, required by many cloud providers).
- Authentication — the database server verifies credentials, checks permissions, and sets up a session. PostgreSQL forks a new OS process for each connection. MySQL creates a new thread.
- Memory allocation — each database connection consumes memory on both client and server. PostgreSQL uses roughly 5-10MB per connection for its process. MySQL uses roughly 1-5MB per thread for buffers and session state.
For a service making 500 requests/second to a PostgreSQL database with 100ms round trip, creating a new connection per request means spending 100-200ms per request just on connection setup. That is 50-100 seconds of cumulative handshake time per second — more time establishing connections than doing actual work.
How a Connection Pool Works
A connection pool manages a set of open connections through their full lifecycle:
Initialization — when the application starts, the pool creates a minimum number of connections (the minimum idle setting). These connections are immediately available for use without any handshake delay.
Checkout — when application code needs a connection, it requests one from the pool. If an idle connection is available, it is returned immediately (sub-millisecond). If all connections are in use and the pool has not reached its maximum size, a new connection is created. If the pool is at maximum capacity, the request waits in a queue.
Use — the application executes queries or API calls on the borrowed connection. The pool tracks which connections are checked out and for how long.
Return — when the application is done, it returns the
connection to the pool. The pool validates the connection is
still healthy (e.g., sends a lightweight query like
SELECT 1) and makes it available for the next checkout.
Eviction — idle connections are closed after a configurable timeout (the idle timeout) to free server resources. Connections are also closed after a maximum lifetime to prevent issues with stale server-side state, rotated credentials, or firewall timeouts that silently drop idle connections.
Pool Sizing with Little's Law
How many connections does a pool need? Little's Law provides the answer:
Pool size = Request rate x Average response time
If your service makes 200 database requests per second and each query takes 25ms on average:
Pool size = 200 req/s x 0.025 s = 5 connections
Only 5 connections can sustain 200 requests/second because each connection handles 40 requests per second (1000ms / 25ms per query). In practice, add 20-50% headroom for variance in response times and traffic bursts, so a pool of 7-8 connections would be appropriate.
Common mistake: oversizing the pool. Setting the pool to 100 connections "just in case" is counterproductive. Each connection consumes server memory. On a PostgreSQL server with 16 CPU cores, 100 active connections mean 100 processes competing for 16 cores. Context switching overhead degrades throughput. Benchmarks consistently show that a pool of 2-3x the CPU core count outperforms a pool of 100+ connections on the same hardware.
Configuration Parameters
Most connection pool libraries expose these settings:
- Maximum pool size — the upper bound on open connections. Prevents the pool from overwhelming the database during traffic spikes. HikariCP defaults to 10.
- Minimum idle — connections kept open even when idle. Ensures the first few requests after a quiet period do not pay the connection setup cost. Set this to your baseline concurrency.
- Connection timeout — how long a checkout request waits in the queue before failing. HikariCP defaults to 30 seconds. If requests regularly hit this timeout, the pool is undersized or queries are too slow.
- Idle timeout — how long an unused connection sits in the pool before being closed. Frees server resources during low traffic. Typically 5-10 minutes.
- Maximum lifetime — the absolute maximum time a connection can exist, regardless of activity. Prevents issues from stale state, credential rotation, or silent firewall drops. HikariCP defaults to 30 minutes.
- Validation query — a lightweight query (like
SELECT 1) executed before returning a connection from the pool to verify it is still alive. Some pools validate on checkout, some on return, some on a periodic schedule.
Database Connection Pools
HikariCP (Java) — the most widely used Java connection pool. Known for extremely fast checkout times (sub- microsecond for cached connections) and a lean codebase. Used by Spring Boot as the default pool. Exposes all standard settings with sensible defaults.
PgBouncer (PostgreSQL) — a lightweight connection multiplexer that sits between application instances and PostgreSQL. Instead of each application instance maintaining its own pool to the database, PgBouncer consolidates connections. Ten application instances with 10-connection pools would normally create 100 database connections. PgBouncer can map those 100 application-side connections to 20 actual database connections, dramatically reducing server load. It supports three pooling modes: session (one database connection per client session), transaction (a connection is assigned only for the duration of a transaction, then returned), and statement (most aggressive — connection returned after each statement).
HTTP Connection Pools
The same pooling concept applies to HTTP clients making
requests to external APIs or internal microservices. HTTP
client libraries (like Python's requests.Session,
Java's HttpClient, or Go's http.Client) maintain pools
of keep-alive connections to each origin.
The key configuration is max connections per host — how many simultaneous connections to maintain to a single origin. Too few and requests queue unnecessarily; too many and you risk overwhelming the target service. A typical default is 20-100 connections per host.
For microservice architectures with service meshes, the sidecar proxy (Envoy in Istio) manages HTTP connection pools automatically, including circuit breaking (stopping new requests when the target is failing) and outlier detection (removing unhealthy instances from the pool).
Connection Pool Anti-Patterns
Pool too large. A 500-connection pool to a PostgreSQL server with 16 cores wastes memory (5-10MB per connection = 2.5-5GB) and degrades throughput via context switching. Use Little's Law to size correctly, then add modest headroom.
No leak detection. Application code that checks out a
connection but never returns it (due to missing close() in
error handling) gradually exhausts the pool. Enable leak
detection (HikariCP's leakDetectionThreshold) to log stack
traces of connections held longer than expected.
No maximum lifetime. Without maxLifetime, connections
can persist indefinitely. Firewalls silently drop idle
connections after 30-60 minutes, and the pool does not know
the connection is dead until the next query fails. Credential
rotation (common in cloud environments) also requires
connections to be periodically refreshed.
Same pool for fast and slow queries. If the pool serves both 5ms lookups and 5-second analytics queries, the slow queries can consume all connections, starving the fast ones. Use separate pools for different query workloads, each sized appropriately.