Caching & Performance Networking & Load Balancing

DNS Resolution in the Browser and Beyond: When TTL is a Suggestion

March 6, 2026

A hostname has to become an IP before any TCP connection can start, and the resolution path is longer than most application engineers picture. The browser checks its own in-process cache. The OS stub resolver checks the system cache (nscd, systemd-resolved, dnsmasq). If neither has it, a query goes to a configured recursive resolver, usually the ISP's, or a public one like 1.1.1.1 or 8.8.8.8.

The recursive resolver is the one doing the actual work. If the answer is not cached, it walks the hierarchy. It asks a root server, which returns a referral to the appropriate TLD server (.com, .org, a country code). The TLD server returns a referral to the authoritative name server for the domain. The authoritative server returns the actual answer: an A or AAAA record, or a CNAME that points to yet another name to resolve. Each step costs a round trip, and CNAME chains compound the cost.

Caching tames this. Every record carries a TTL, and each layer (browser, OS, recursive resolver) is supposed to honor it. EDNS Client Subnet allows the recursive resolver to send a slice of the client's IP to the authoritative server, so CDNs can return geo-appropriate answers instead of one fixed IP for the whole planet.

The production failure that catches teams: a deploy moves a service to a new IP. The team set the record's TTL to 300 the day before, expecting a five minute cutover. They flip the record. Most users follow within minutes. Customers behind one major ISP keep hitting the old IP for hours. The reason is that the ISP's recursive resolver enforces a local minimum TTL of 600 seconds, and rounds shorter TTLs up. The authoritative TTL was a suggestion, not a contract. Worse, some browsers and JVMs (networkaddress.cache.ttl) have their own caching, and many corporate forwarders extend TTLs to reduce upstream load.

The fix is to plan for the disobedient. Drop TTLs to 60 seconds at least 24 hours before a cutover, longer if you have caching DNS in front of a flaky upstream. Keep the old IP serving traffic for the worst-case cache lifetime, not the TTL you set. For anything mission-critical, dual-home: keep the new IP up while the old IP redirects or proxies, so the long tail of stuck resolvers does not see broken responses. If you control the client, use a connection pool that occasionally re-resolves regardless of cache state.

DNS is fast when it works and invisible when it does not. Assume someone in the chain is lying about TTL, because someone always is.

Key takeaway

TTL is not a contract. It is a hint that any layer in the resolution chain can override. Plan your IP cutovers around the resolvers that ignore you, not the ones that obey.

Originally posted on LinkedIn. View original.