System Design Fundamentals
Storage & Data Modeling
Partitioning, Replication & Consistency
Caching & Edge
Messaging & Streaming
Reliability & Operability
Security & Privacy
Content Negotiation, ETags, Conditional Requests, and Caching
Why does a single URL like /api/users/123 need to serve JSON to a mobile app, HTML to a browser, and a gzipped response to both? Because the alternative — separate endpoints per format — explodes your API surface. Content negotiation lets one URL serve multiple representations. The client advertises what it wants; the server picks the best match.

Request Headers — What the Client Sends
Three headers drive negotiation:
Accept declares preferred media types. application/json for APIs, text/html for browsers. The q parameter sets priority — Accept: text/html, application/json;q=0.9 means "prefer HTML, but JSON is fine."
Accept-Language requests a human language. en-US, fr;q=0.8 means "English first, French as fallback." Internationalized APIs use this to select translated content without separate endpoints per locale.
Accept-Encoding lists supported compression algorithms. gzip, br tells the server it can decompress either format. Brotli (br) compresses 15-20% better than gzip for text but is more CPU-intensive to encode.
Response Headers — What the Server Returns
The server picks a representation and labels it:
If no compatible format exists, the server returns 406 Not Acceptable. In practice, most APIs default to JSON when Accept is missing rather than rejecting the request. A well-designed 406 response should include a list of supported media types in the response body, so the client knows what formats to try next:
This turns an error into actionable guidance. Without it, the client has no way to discover what the server supports except by trial and error.
Quality Values and Negotiation Priority
The q parameter (quality value) ranges from 0 to 1 and expresses preference strength. The default is 1.0 when omitted:
This tells the server: "I strongly prefer HTML (q=1.0), JSON is acceptable (q=0.9), plain text is a last resort (q=0.5)." The server evaluates its capabilities against these preferences and selects the highest-quality match it can produce. If the server supports JSON and plain text but not HTML, it returns JSON (q=0.9 beats q=0.5).
Quality values enable graceful degradation. A client that prefers Protobuf but can fall back to JSON expresses this as Accept: application/protobuf, application/json;q=0.5. The server returns its best available format without breaking the client.
One edge case worth knowing: q=0 explicitly means "I do not want this format." Accept: text/html, application/xml;q=0 means "give me HTML, and never give me XML." Servers must treat q=0 as a hard rejection, not a low preference. If the only format the server supports is XML, it should return 406 Not Acceptable rather than serving a format the client explicitly excluded.
The Vary Header — Why Caches Need It
Without Vary, a cache stores one response per URL. If client A requests JSON and the cache stores it, client B requesting HTML gets stale JSON. Vary: Accept, Accept-Language tells caches to key on these headers — the cache stores separate entries for JSON/English, HTML/French, etc. This is the bridge between content negotiation and caching: Vary prevents format collisions in every cache layer from browser to CDN.
Server-Driven vs. Agent-Driven Negotiation
Everything above describes server-driven (proactive) negotiation: the client advertises preferences in headers, and the server picks the best representation. This is the default HTTP model and covers most use cases.
Agent-driven (reactive) negotiation flips the control. The server returns a 300 Multiple Choices response listing all available representations with their media types and URLs. The client picks one and follows the link. This is rare in modern APIs but appears in content management systems and document repositories where the number of representations is large (PDF, DOCX, HTML, Markdown) and the server does not want to guess.
The practical trade-off: server-driven negotiation requires one round trip but the server might guess wrong. Agent-driven negotiation requires two round trips (discover options, then fetch) but the client always gets exactly what it wants. For APIs, the single-round-trip server-driven model dominates because clients know what format they need.
Be careful with Vary granularity. Vary: * (wildcard) effectively disables caching because every request is treated as unique. Vary: Accept-Encoding is usually safe because browsers send a small set of values (gzip, br, identity). Vary: User-Agent is dangerous because there are thousands of unique User-Agent strings, fragmenting the cache into thousands of entries per URL.
Vary turns a simple URL-based cache key into a composite key: URL + Accept + Accept-Language + Accept-Encoding. Forgetting to set Vary on a negotiated response means CDNs serve the wrong format to users — a JSON response to a browser expecting HTML, or English content to a French-speaking user.
An ETag is a fingerprint of a resource's content. When the content changes, the ETag changes. When it stays the same, the ETag stays the same. This simple property enables two powerful mechanisms: cache validation without re-downloading, and optimistic concurrency control without distributed locks.

How ETags Work
On the initial request, the server returns the resource body along with an ETag header:
The client caches both the body and the ETag. On subsequent requests, the client sends the cached ETag back:
The server compares the current ETag with the client's value. If they match, the resource has not changed — the server returns 304 Not Modified with no body. The client uses its cached copy. If they differ, the server returns 200 OK with the new body and a new ETag.
The bandwidth saving is significant. A 50KB API response reduced to a 200-byte 304 header is a 99.6% reduction per request. For CDNs serving millions of requests, this translates directly to lower egress costs and faster response times. A service handling 100K requests per second with 90% cache hit rate saves roughly 4.5GB/s of bandwidth — the equivalent of removing 90,000 full response generations from the origin.
Beyond bandwidth, 304 responses improve perceived latency. A full 200 response requires the server to serialize the body, compress it, and transmit it. A 304 response skips all three steps — the server sends headers only, and the client renders from local cache. For geographically distant clients where transmission time dominates, the difference between a 50KB payload and a 200-byte header can be 100ms or more per request.
Strong vs. Weak ETags
Strong ETags guarantee byte-for-byte identity. ETag: "a1b2c3d4" means every byte of the response body matches. Use strong ETags when exact content matters — file downloads, range requests, binary resources.
Weak ETags indicate semantic equivalence. ETag: W/"v42" means the content is functionally the same even if bytes differ (e.g., different whitespace in JSON, different timestamp formatting). Use weak ETags when you care about the data meaning, not its exact serialization.
The practical distinction: strong ETags must change if any byte changes (including whitespace, header ordering in JSON). Weak ETags only change when the meaning changes. For most APIs, weak ETags are more practical because JSON serialization order can vary between servers or library versions.
When to use each:
- Strong ETags: file downloads, binary resources, range requests, byte-serving CDNs
- Weak ETags: API responses, HTML pages, any content where semantic equivalence matters more than byte identity
Strong "a1b2c3d4" — byte-for-byte identity, required for range requests, fragile across servers with different serialization. Weak W/"v42" — semantic equivalence, suitable for API responses and HTML, robust across heterogeneous servers. Range requests (If-Range) require strong ETags because stitching byte ranges from semantically-equivalent-but-byte-different versions produces corrupted output.
Generating Consistent ETags
ETags must be consistent across all server replicas. If server A generates "abc" and server B generates "xyz" for the same unchanged resource, conditional requests fail randomly depending on which server the client hits.
Two reliable approaches:
Content hash — Hash the response body (MD5, SHA-256). Identical content always produces the same hash regardless of which server computes it. Downside: requires generating the full response before computing the ETag, which negates some performance benefit. Content hashing works well for static files (images, CSS, JavaScript) where the file exists on disk and the hash can be precomputed at deploy time or on first access, then cached. For dynamic API responses that are assembled per-request, content hashing is less practical because you must serialize the entire response before you know whether it changed.
Version field — Use a database column (version number, updated_at timestamp, row version). The ETag comes from the database, not the response body, so all servers return the same value. This is faster because you can compare ETags without generating the response body at all.
ETag Scope — Per-Resource vs. Per-Representation
An ETag identifies a specific representation, not just a resource. If /api/users/123 serves JSON and XML via content negotiation, the JSON and XML representations should have different ETags even when the underlying data is identical — because the bytes differ. A client caching the JSON representation should not receive a 304 based on the XML ETag.
This interacts with the Vary header. When the cache key includes Vary: Accept, each format gets its own cache entry with its own ETag. If you forget Vary, a client requesting XML might receive a 304 based on a JSON ETag comparison — serving stale or wrong-format data silently. Combining per-representation ETags with correct Vary headers prevents this class of bugs.
For composite resources (an API response assembled from multiple database tables or microservice calls), derive the ETag from the maximum version across all contributing sources. If a user profile combines data from a users table (version 5) and a preferences table (version 12), the composite ETag could be "u5-p12". Any individual source changing produces a new composite ETag.
In distributed systems with CDNs, ETags flow through the cache hierarchy. The origin generates the ETag, the CDN stores it with the cached response, and forwards it to clients. When a client sends If-None-Match, the CDN can validate locally if the entry is still fresh, or forward the conditional request to the origin if stale. Some CDNs generate their own ETags based on the cached body hash — this works for simple cases but breaks if the origin uses version-field ETags, because the CDN's hash-based ETag will not match the origin's version-based ETag on revalidation.
ETags for Optimistic Concurrency
ETags also solve the lost-update problem. Without concurrency control, two clients can read the same resource, make changes, and overwrite each other — the last write wins silently.
With If-Match, the flow becomes:
Client B's update is rejected because the resource changed since it was read. Client B must re-read, merge changes, and retry. No locks are held, no deadlocks possible, and the server remains stateless between requests. This is called optimistic concurrency control because it assumes conflicts are rare and only detects them at write time rather than preventing them with locks.
The trade-off: under high contention (many clients updating the same resource simultaneously), optimistic concurrency generates more retries because conflicts are frequent. Pessimistic locking (SELECT FOR UPDATE) prevents conflicts upfront but holds database locks that can cause deadlocks and resource starvation. For most APIs where conflicts are rare (different users editing different resources), optimistic concurrency with If-Match is the better choice because it keeps the HTTP layer stateless.
In interviews, prefer the version-field approach for ETags. It avoids the circular problem of content-hashing: you must generate the full response to compute the hash, but the whole point of ETags is to avoid generating the response when content has not changed. A database version column lets you compare ETags with a single integer comparison.
Conditional requests let clients say "do this only if the resource is (or is not) at version X." This turns every HTTP operation into a check-and-act — avoiding wasted transfers on reads and preventing lost updates on writes, all without distributed locks or custom protocols.
Conditional Headers for Reads
If-None-Match sends the cached ETag. "Give me the resource only if it has changed." If the server's current ETag matches, it returns 304 Not Modified with no body. If different, it returns 200 with the new body and new ETag. This is the primary mechanism for cache revalidation.
If-Modified-Since sends the cached Last-Modified timestamp. "Give me the resource only if it changed after this date." The server compares timestamps and returns 304 or 200. This is less precise than ETags — timestamps have second-level granularity, so two changes within the same second produce the same timestamp. Additionally, servers with clock skew can return incorrect results. Prefer ETags when available; use If-Modified-Since as a fallback for servers that do not generate ETags.
Conditional Headers for Writes
If-Match sends the expected ETag. "Apply this update only if the resource is still at this version." If the ETag matches, the update proceeds (200 or 204). If it does not match, the server returns 412 Precondition Failed. This is optimistic concurrency control — no locks, no blocking, just a version check at write time.
If-Unmodified-Since sends a timestamp. "Apply this update only if the resource has not changed since this date." Same logic as If-Match but with timestamp precision. Less commonly used because ETags are more reliable. In practice, If-Match with ETags is the standard for optimistic concurrency, and If-Unmodified-Since exists mainly for compatibility with systems that only track modification timestamps.
If-Range for Resumable Downloads
A client downloading a large file gets disconnected at 50%. It sends Range: bytes=5242880- with If-Range: "v1". If the file has not changed (ETag still "v1"), the server sends the remaining bytes (206 Partial Content). If the file changed, the server ignores the range and sends the complete new file (200 OK). Without If-Range, a resumed download could stitch together bytes from two different file versions — corrupted data with no error.
This is especially important for large media files, software updates, and database backups where partial re-downloads save significant time and bandwidth. A 2GB file interrupted at 1.8GB only needs 200MB to complete — but only if the file is unchanged. If-Range makes this decision automatically.
Why This Matters at Scale
Conditional requests eliminate an entire class of distributed systems problems. Cache revalidation avoids redundant data transfer. Optimistic concurrency avoids distributed locking. Resumable downloads avoid re-transferring gigabytes of data. All three use the same mechanism: send a validator, get a conditional response. The HTTP specification solved these problems decades ago — you do not need to reinvent them.
The key insight is that conditional headers turn HTTP into a check-and-act protocol. Without them, every read transfers the full body and every write overwrites blindly. With them, reads skip unchanged data and writes detect conflicts. This is the foundation that makes caching and concurrency work at internet scale without custom middleware.
Caching headers are the dials that control who may cache a response, how long it stays fresh, and what happens when it expires. Getting these right means your CDN absorbs 90% of traffic. Getting them wrong means stale data, cache stampedes, or no caching at all.

Cache-Control Directives
max-age=N — The response is fresh for N seconds. During this window, caches serve it without contacting the origin. After expiry, caches must revalidate before serving.
s-maxage=N — Like max-age but only applies to shared caches (CDNs, reverse proxies). Overrides max-age for CDNs while letting browsers use a different TTL. Useful pattern: max-age=60, s-maxage=3600 means browsers revalidate every minute but CDNs cache for an hour.
no-cache — Does NOT mean "do not cache." It means "cache the response but always revalidate with the origin before serving." The cache stores the response and sends a conditional request (If-None-Match) on every access. If the origin returns 304, the cache serves its copy. This gives you the bandwidth savings of caching with the correctness guarantee of always checking.
no-store — Actually means "do not cache." The response must not be stored anywhere — not in browser cache, not in CDN, not in any intermediate proxy. Use for sensitive data: authentication tokens, personal financial data, health records.
must-revalidate — After max-age expires, the cache must revalidate with the origin. Without this, a cache under load might serve a stale response rather than waiting for revalidation. This directive closes that loophole.
public — Any cache (browser, CDN, proxy) may store the response. Required for CDN caching of authenticated responses. Without public, CDNs typically refuse to cache responses that include an Authorization header.
private — Only the end user's browser may cache this response. CDNs and proxies must not store it. Use for user-specific data (account settings, shopping cart, personalized recommendations) that should not leak between users via a shared cache. The word "private" refers to the cache layer, not encryption — it means "browser-only cache," not "encrypted."
immutable — Tells the cache that this response will never change at this URL. Combined with a long max-age, this prevents even conditional revalidation. Browsers respect this for content-hashed URLs where the hash guarantees the content matches the URL forever.
no-cache does not mean do not cache. It means always revalidate. The directive that actually prevents caching is no-store. Confusing these two is one of the most common HTTP caching mistakes — and it shows up in interviews. If you set no-cache expecting zero caching, the response is still stored and served after revalidation.
Freshness vs. Revalidation
A cached response has two states: fresh (age less than max-age, serve immediately) and stale (age exceeds max-age, must revalidate). Revalidation sends a conditional request to the origin using If-None-Match (ETag) or If-Modified-Since (timestamp). The origin responds with 304 (unchanged, keep using cache) or 200 (new body, replace cache).
The stale-while-revalidate=N extension serves the stale response immediately to the client while asynchronously revalidating in the background. This eliminates the revalidation latency from the user's perspective at the cost of briefly serving stale data.
This three-phase model gives you 60 seconds of zero-latency cached responses, then 30 seconds of zero-latency stale responses while the cache refreshes, then falls back to synchronous revalidation only if the background refresh has not completed. For most APIs, users never experience the synchronous phase.
Common Cache-Control Combinations
Understanding individual directives is not enough — the combinations matter. Here are the patterns you will use repeatedly:
Choosing the Right Strategy
Static assets (JS, CSS, images) — Cache-Control: public, max-age=31536000, immutable. Cache forever. Use content-hashed filenames (app.a1b2c3.js) so new deploys produce new URLs that bypass the cache entirely.
API responses (product catalog, configs) — Cache-Control: public, max-age=60, stale-while-revalidate=30. Fresh for 1 minute, stale-but-updating for 30 more seconds. Keeps responses fast while limiting staleness.
User-specific data (profiles, dashboards) — Cache-Control: private, no-cache with an ETag. Browser can cache but must revalidate on every request. Prevents CDN from mixing up users while still saving bandwidth via 304.
Frequently updated data (stock prices, dashboards) — Cache-Control: public, max-age=5, stale-while-revalidate=10. Very short freshness with async revalidation. The CDN absorbs bursts without serving data more than 15 seconds old.
Sensitive data (auth tokens, financials) — Cache-Control: no-store. Never cached anywhere. Every request hits the origin.
One nuance: max-age=0 is not the same as no-cache, even though both cause revalidation. max-age=0 means the response is immediately stale and should be revalidated, but the cache may serve a stale response under certain conditions (like when the origin is unreachable). no-cache is stricter — it requires successful revalidation before serving. For critical data where serving a stale fallback is unacceptable, use no-cache. For data where "stale is better than nothing," max-age=0 with must-revalidate gives you revalidation with a safety net.
Multi-Tier Caching
In practice, responses pass through multiple cache layers: browser cache, CDN edge node, reverse proxy, and application-level cache. Each layer evaluates Cache-Control independently. The s-maxage directive controls shared caches (CDN, proxy) separately from browser caches (max-age). A common pattern: max-age=0, s-maxage=300 means the browser always revalidates (keeping user-facing data fresh) while the CDN caches for 5 minutes (absorbing traffic spikes). Each layer reduces the load on the layer behind it — a well-tuned multi-tier cache can absorb 99% of read traffic before it reaches the origin.
Cache Stampede Prevention
When a popular resource's cache expires, every client that requests it simultaneously hits the origin — a thundering herd. If 10,000 clients cache a product page with max-age=60, all 10,000 entries expire at roughly the same second. The origin receives 10,000 identical requests instead of the usual 1 request per 60 seconds.
Three mitigation strategies:
Request coalescing — The CDN detects multiple simultaneous requests for the same expired resource and sends only one to the origin. All other requests wait for that single response, which is then shared. Most CDNs (Cloudflare, Fastly, CloudFront) support this natively.
Jittered expiration — Instead of a fixed max-age=60, add a random offset per cache entry (55-65 seconds). Expirations spread over 10 seconds instead of clustering at one instant. This is a client-side or CDN-edge technique — the Cache-Control header itself stays fixed, but the cache layer adds jitter internally. Example:
Background refresh — stale-while-revalidate naturally prevents stampedes by letting one request trigger the refresh while all others serve stale content. Combined with request coalescing, only one origin request is needed regardless of how many clients hit the cache simultaneously.
Cache Invalidation
TTL-based expiration is passive — you wait for max-age to elapse. But sometimes you need to remove cached content immediately: a price change, a content takedown, or a security patch.
Purge removes a specific URL from the cache. Fast and precise, but requires knowing every URL variant (with query strings, Vary dimensions). Purging /api/products/42 does not purge /api/products/42?fields=price — each is a separate cache entry.
Tag-based invalidation (surrogate keys) assigns tags to cached responses. A product page gets tags like product-42, category-electronics. When product 42 changes, invalidate the product-42 tag and every cache entry tagged with it is purged — regardless of URL. This scales better than URL-by-URL purging because one tag can invalidate hundreds of related cache entries.
Soft purge marks a cache entry as stale rather than deleting it. The entry remains servable via stale-while-revalidate while the cache fetches a fresh copy. This avoids a latency spike from cache misses during invalidation — the stale version serves until the new version is ready.
The practical pattern: use TTL for routine freshness (max-age handles 99% of cases), tag-based invalidation for urgent content changes, and soft purge for high-traffic resources where a cache miss storm would overwhelm the origin.
A common mistake is treating cache invalidation as a synchronous, globally consistent operation. CDNs have edge nodes worldwide, and purge propagation takes seconds to minutes. During this window, some edge nodes serve stale content while others serve fresh content. Design for this reality — if your application breaks when two users see briefly different versions of a resource, you need shorter TTLs or no-cache with revalidation, not aggressive purging.
These four mechanisms — content negotiation, ETags, conditional requests, and caching headers — work together to shift work away from your origin servers and onto browsers, CDNs, and proxies.
Content negotiation lets one URL serve many formats. Set Vary correctly or caches serve the wrong representation.
ETags are resource fingerprints. They enable cache revalidation (304 Not Modified saves bandwidth) and optimistic concurrency (If-Match prevents lost updates without locks). Generate them from database version columns, not content hashes, to avoid the circular problem of generating responses you are trying to skip.
Conditional requests are the protocol-level check-and-act. If-None-Match for reads, If-Match for writes, If-Range for resumable downloads. They eliminate redundant transfers and prevent write conflicts using HTTP headers alone.
Caching headers control freshness and storage. max-age for TTL, no-cache for always-revalidate, no-store for never-cache, stale-while-revalidate for zero-latency refreshes.
Combined, these mechanisms mean a well-configured API can serve 90%+ of read traffic from caches, reduce egress costs by 80%, and handle concurrent writes safely — all without custom infrastructure. The HTTP specification provides these capabilities for free. The only cost is understanding them correctly.
The progression matters too. Start with Cache-Control on read endpoints (immediate win, zero code changes). Add ETags for conditional validation (requires generating ETags but saves bandwidth). Add If-Match for write endpoints (prevents lost updates). Add Vary for content-negotiated endpoints (prevents cache collisions). Each layer builds on the previous one, incrementally reducing origin load and improving correctness.
A 304 Not Modified response is the most efficient response in HTTP. It confirms the client's cached data is current, transfers only headers (about 200 bytes), and requires no response body generation. At scale, the difference between serving 200 OK with a full body versus 304 with just headers is the difference between needing 100 servers and needing 10.