CDN and Edge Caching Strategies

Course

System Design Fundamentals

CDN and Edge Caching Strategies

Topics Covered

How CDN Caching Works: Basics and Benefits

CDN request flow

TTL and content freshness

Revalidation after TTL expiry

stale-while-revalidate

Cache hit ratio

Cache Key Design and Best Practices

Default cache key components

Query string handling

The Vary header

Versioned URLs for static assets

Cache poisoning risks

Access Control with Signed URLs and Cookies

Signed URLs

Signed cookies

Edge auth with serverless compute

Security best practices

Cache Invalidation Strategies

Purge and ban

Versioned URLs

TTL-based expiry

stale-while-revalidate and stale-if-error

Soft purge vs hard purge

Choosing the right invalidation strategy

Provider-Specific Highlights and Examples

CloudFront (AWS)

Cloudflare

Fastly

Provider comparison

Best Practices and Summary

Build a cache-friendly architecture

Choose the right TTL for each content type

Monitor relentlessly

Secure the origin

Common pitfalls to avoid

Summary of key decisions

How CDN Caching Works: Basics and Benefits

Why do large-scale systems place caches at the edge of the network instead of relying solely on origin servers? Because physics does not negotiate. A user in Tokyo requesting an image stored in Virginia faces a minimum of roughly 150 ms of round-trip latency just from the speed of light in fiber. Multiply that by the number of round trips for DNS, TCP, TLS, and the HTTP request itself, and the page load penalty becomes significant. A CDN (Content Delivery Network) eliminates most of that penalty by serving cached content from a Point of Presence (PoP) that is geographically close to the user.

A CDN is a distributed network of edge servers deployed across dozens or hundreds of PoPs worldwide. Each PoP contains one or more edge servers that can store and serve cached copies of your content. Behind those PoPs sits your origin server, which is the authoritative source of truth for every resource. Some CDN architectures add a shield (mid-tier) cache between the edge and the origin. When an edge PoP has a cache miss, it checks the shield first instead of going directly to origin. This reduces origin load significantly when you have many PoPs, because misses are consolidated at the shield layer rather than each PoP independently hitting the origin.

CDN request flow

When a user requests a resource, the following sequence occurs:

DNS resolution with Anycast. The user's DNS query resolves to an IP address shared by many PoPs. With Anycast routing, BGP directs the request to the nearest PoP based on network topology, not just geography. Cloudflare and Fastly both use this approach. Other CDNs like CloudFront use latency-based DNS routing to achieve a similar effect.
Edge cache lookup. The edge server computes a cache key from the request (typically scheme + host + path + selected query parameters) and checks its local store.
Cache hit. If a fresh copy exists, the edge responds directly. The origin never sees the request. Response time drops to single-digit milliseconds.
Cache miss. If no cached copy exists or the cached copy is stale, the edge forwards the request to the origin (or to an intermediate shield cache), stores the response, and serves it to the user.

When multiple users request the same uncached resource at roughly the same time, good CDNs use request coalescing (also called request collapsing). Instead of sending 100 concurrent requests to the origin, the edge sends one request and queues the other 99 until the response arrives, then serves all of them from the newly cached copy. This protects the origin during traffic spikes and cold cache scenarios.

Benefits of this architecture:

Lower latency. Responses travel shorter distances. A cache hit at a nearby PoP can return in under 10 ms instead of 150-300 ms from a distant origin.
Reduced origin load. With a 95% cache hit ratio, the origin handles only 5% of total traffic. That is a 20x reduction in origin requests.
Higher availability. If the origin is temporarily unreachable, the CDN can continue serving stale content (when configured to do so), keeping the site alive.
DDoS absorption. CDN edge networks have massive aggregate bandwidth and can absorb volumetric attacks before they reach your infrastructure.
TLS termination at the edge. The CDN handles the TLS handshake with the user over a short network path, then uses a persistent, optimized connection to the origin. This reduces the latency cost of encryption.

TTL and content freshness

Every cached object has a Time-To-Live (TTL) that determines how long the edge considers it fresh. TTL is controlled by HTTP response headers from the origin:

Cache-Control: max-age=3600 tells any cache (browser or CDN) the object is fresh for 3600 seconds.
Cache-Control: s-maxage=86400 overrides max-age specifically for shared caches like CDNs, letting you set a longer edge TTL while keeping a shorter browser TTL.
Expires: Thu, 01 Jan 2026 00:00:00 GMT is the legacy absolute-time approach. Modern practice favors Cache-Control.

CDN providers can also override origin TTLs. CloudFront cache policies let you set minimum, maximum, and default TTLs. Cloudflare Edge Cache TTL rules can clamp or ignore origin headers entirely.

It is important to distinguish freshness from retention. Freshness (TTL) is how long the CDN is allowed to serve the object without checking the origin. Retention is how long the object physically remains in cache storage. An object can be evicted before its TTL expires if it is unpopular and the cache is full (most CDNs use LRU or similar eviction). Conversely, a stale object past its TTL may remain in storage for revalidation purposes.

Revalidation after TTL expiry

When TTL expires, the object becomes stale. The edge does not immediately delete it. Instead, on the next request it sends a conditional request to the origin:

If-None-Match: "abc123" sends the stored ETag back to the origin.
If-Modified-Since: Wed, 01 Jan 2026 12:00:00 GMT sends the stored last-modified date.

The origin responds with either:

304 Not Modified (no body). The edge refreshes the TTL and serves the existing cached copy. This saves bandwidth because the full response body is not retransmitted.
200 OK with a new body. The edge replaces the cached object and serves the updated version.

Conditional requests are especially valuable for large objects like images and video segments where a 304 saves substantial bandwidth compared to re-downloading the full body. For a 2 MB image that has not changed, a 304 response is typically under 500 bytes. That is a 4,000x bandwidth savings on a single request, multiplied across millions of revalidations per day.

stale-while-revalidate

The stale-while-revalidate directive tells the CDN it can serve a stale object immediately while fetching a fresh copy in the background. For example, Cache-Control: max-age=60, stale-while-revalidate=300 means the object is fresh for 60 seconds, then stale-but-servable for another 300 seconds while the edge revalidates asynchronously. The user never waits for the origin.

This directive is particularly valuable for HTML pages and API responses where near-freshness is acceptable. The first user after TTL expiry gets the stale copy instantly. The background fetch updates the cache so all subsequent users get the fresh version. Without this directive, that first user would wait for the full origin round trip.

Cache hit ratio

Cache Hit Ratio (CHR) measures CDN effectiveness:

CHR = cache hits / (cache hits + cache misses)

A static-heavy website with versioned assets routinely achieves 95-99% CHR. An API-heavy site with short TTLs might see 40-70%. Factors that affect CHR include:

TTL length. Longer TTLs keep objects fresh in cache for more requests.
Cache key design. Fewer unnecessary key components mean fewer unique entries and higher hit rates.
Content popularity distribution. Head content (popular pages) gets high CHR. Long-tail content (rarely accessed pages) gets lower CHR per PoP.
Number of PoPs. More PoPs means each one has a smaller share of traffic, reducing per-PoP hit rates unless a shield tier consolidates misses.

Key Insight

Cache hit ratio is the single most important CDN metric. Every percentage point of CHR improvement directly reduces origin load and user-facing latency. Before tuning anything else, instrument CHR per content type and per PoP.

Cache Key Design and Best Practices

The cache key is the identity of a cached object. Two requests that produce the same cache key will share the same cached response. Two requests that produce different cache keys will be treated as completely separate objects, even if the origin would return identical content. Poor cache key design is the most common reason for unexpectedly low cache hit ratios.

Default cache key components

Most CDNs construct the default cache key from:

Scheme (http or https)
Host (e.g., cdn.example.com)
Path (e.g., /images/logo.png)
Query string (e.g., ?v=2&color=blue)

Some CDNs also include selected request headers (via the Vary mechanism) or cookies in the key. The important principle is that everything in the cache key must be necessary and sufficient to produce a correct response. Too few components and you risk serving the wrong content. Too many and you fragment the cache unnecessarily.

Query string handling

Query strings are the most frequent source of cache fragmentation. Consider these two URLs:

/api/products?category=shoes&sort=price
/api/products?sort=price&category=shoes

These produce identical content from the origin, but many CDNs treat them as different cache keys because the query parameter order differs. Solutions:

Query string sorting. CloudFront and Cloudflare can normalize parameter order so both URLs hash to the same key.
Whitelist approach. Include only parameters that affect the response. If utm_source and fbclid are tracking parameters that do not change the content, exclude them from the cache key. This is the recommended default because it is explicit about which parameters matter.
Blacklist approach. Exclude known-irrelevant parameters. This is easier to start with but riskier because new parameters are included by default, potentially fragmenting the cache unexpectedly.

The Vary header

The Vary header tells the CDN that the response depends on specific request headers. The CDN includes those header values in the cache key.

Vary: Accept-Encoding is standard and safe. The CDN caches separate copies for gzip, br (Brotli), and uncompressed. Most CDNs normalize this automatically so you get at most three variants.
Vary: Accept-Language creates one cached variant per language. This is fine when you have a small number of languages (5-10), but becomes problematic if the origin returns many locale variants. Consider normalizing the Accept-Language header at the edge to a supported set (e.g., en, es, fr, de, ja) to keep variant count bounded.
Vary: User-Agent is almost always a mistake. There are thousands of unique User-Agent strings, so the CDN stores thousands of nearly identical copies and the hit ratio collapses. Instead, normalize at the edge into a small set of device categories (mobile, tablet, desktop) and vary on that custom header.

Versioned URLs for static assets

The most cache-friendly pattern for static assets is content-hashed filenames:

/static/app.3f8a2b1c.js
/static/styles.7d4e9f0a.css

Build tools like Webpack, Vite, and esbuild generate these automatically. Because the filename changes whenever the content changes, you can set Cache-Control: public, max-age=31536000, immutable (one year). The CDN never needs to revalidate or purge these files. When you deploy a new version, the HTML references a new filename and the old cached copy simply ages out.

Interview Tip

Design your cache keys to be as coarse as possible while still producing correct responses. Every unnecessary variation in the key creates a separate cached object that the origin must populate independently. Start with query-string whitelisting and avoid Vary: User-Agent.

Cache poisoning risks

Cache poisoning occurs when an attacker manipulates a request so the CDN caches a malicious response and serves it to other users. Common vectors include:

Unkeyed headers. If the origin uses a header like X-Forwarded-Host to generate links but the CDN does not include that header in the cache key, an attacker can send a crafted header, get a poisoned response cached, and have it served to everyone.
Unkeyed query parameters. If a parameter like callback affects the response body but is excluded from the cache key, the same attack applies.

Mitigation: audit which request components affect the origin response and ensure they are either included in the cache key or stripped before reaching the origin. Tools like Param Miner can help detect unkeyed inputs. Additionally, set Cache-Control: no-store on any response where the body is influenced by unvalidated user input. In an interview context, mentioning cache poisoning as a security consideration when discussing CDN architecture shows depth beyond basic caching mechanics.

Access Control with Signed URLs and Cookies

CDN caching is powerful, but not every cached resource should be publicly accessible. Premium video content, private documents, and user-specific downloads all need access control at the edge. The challenge is enforcing authorization without routing every request back to the origin, which would defeat the purpose of caching.

Signed URLs

A signed URL is a regular URL with embedded authorization parameters. Your application server generates the signed URL, and the CDN edge validates the signature without contacting the origin.

A typical signed URL contains three components:

Resource path. The specific file or path the user is authorized to access.
Expiration timestamp. A Unix timestamp after which the URL is no longer valid. Short-lived tokens (5-60 minutes) limit the window for sharing or replay.
Cryptographic signature. An HMAC or RSA signature computed over the resource path, expiration, and optionally the client IP. The CDN edge holds the public key or shared secret to verify the signature.

The flow works like this:

The user authenticates with your application server (login, OAuth, etc.).
Your server verifies the user's subscription or permissions, then generates a signed URL with a short expiration and returns it.
The user's browser requests the resource using the signed URL.
The CDN edge validates the signature, checks the expiration, and serves the cached content if valid. If the signature is invalid or expired, the edge returns a 403.

The content itself is cached normally at the edge. The signed URL does not change the cache key or the cached content. It only controls who can access it. This means 1,000 authorized users requesting the same signed resource all get served from the same cache entry, just with individual signature validation. This is an important distinction: signed URLs provide authorization, not personalization. The cached content is the same for everyone; the signature just controls who is allowed to see it.

CloudFront uses RSA key pairs. You upload a public key to CloudFront and sign URLs with your private key. CloudFront also supports key groups so you can rotate keys without downtime by adding a new key before removing the old one.

Akamai uses HMAC-based edge auth tokens. The token includes fields like the start time, end time, and an ACL (path pattern), all signed with a shared secret.

Signed cookies

Signed URLs authorize access to a single resource. When users need access to many resources (e.g., all segments of a video stream or all files in a private directory), generating a separate signed URL for each one is impractical. Signed cookies solve this by authorizing access to a set of resources.

After the user authenticates, your server sets signed cookies that the browser attaches to every subsequent request. The CDN edge validates the cookie signature and grants access to any resource matching the cookie's policy (path prefix, expiration, IP restriction).

CloudFront signed cookies use the same RSA key pairs as signed URLs. The cookie contains three values: CloudFront-Policy, CloudFront-Signature, and CloudFront-Key-Pair-Id. The policy can use wildcards to match path prefixes, so a single cookie can authorize access to /private/movies/* covering all video segments, manifests, and thumbnails under that path.

Edge auth with serverless compute

For more complex authorization logic, modern CDNs support running code at the edge:

Lambda@Edge (CloudFront). Node.js or Python functions that execute on viewer-request or origin-request events. You can validate JWTs, check user roles, or rewrite requests. Cold starts can range from 50-200 ms, so this works best for steady traffic patterns.
Cloudflare Workers. V8 isolates that run on every request with sub-5 ms cold starts. You can implement full JWT validation, call an external auth service, or apply rate limiting at the edge.

Edge auth moves authorization decisions closer to the user. The origin never receives unauthorized requests, and you can implement logic that signed URLs cannot express (role-based access, time-of-day restrictions, geographic restrictions based on the user's PoP location).

One important consideration with edge auth is token caching. If the edge function calls an external auth service on every request, you lose the latency benefit. Common patterns include caching JWT validation results briefly (30-60 seconds) at the edge, or using asymmetric JWT verification where the edge holds the public key and validates locally without any network call.

Security best practices

Short-lived tokens. Set expiration to the minimum viable window. For video segments, 5-15 minutes is often sufficient. For one-time downloads, even shorter.
HTTPS only. Signed URLs in plaintext HTTP can be intercepted and replayed. Always enforce HTTPS at the CDN.
Lock down the origin. Use origin access control (CloudFront OAC) or firewall rules so the origin only accepts requests from CDN edge IPs. This prevents users from bypassing signed URL checks by hitting the origin directly.
IP binding. When practical, bind signed URLs to the client IP address. This prevents URL sharing but can cause issues with mobile users whose IP changes during a session or users behind carrier-grade NAT where many users share a single IP.
Key rotation. Rotate signing keys periodically (every 90 days is a common policy). Use key groups (CloudFront) or multiple active secrets (Akamai) so you can add the new key before removing the old one, avoiding any window where valid signatures are rejected.
Audit logging. Log all 403 rejections at the edge with the reason (expired, invalid signature, wrong IP). This helps debug user complaints and detect attack patterns.

Cache Invalidation Strategies

Cache invalidation is famously one of the hardest problems in computer science. With a CDN, the difficulty is amplified because cached content is distributed across dozens or hundreds of PoPs worldwide. When content changes at the origin, you need a strategy to ensure users see the updated version within an acceptable time window.

Purge and ban

The most direct approach is to explicitly tell the CDN to remove a cached object. Terminology varies by provider:

Purge removes a specific URL from all PoPs. CloudFront calls this "invalidation." You submit the path (e.g., /images/hero.jpg) and the CDN propagates the deletion globally.
Ban (Varnish/Fastly terminology) invalidates all objects matching a pattern, such as all URLs under /blog/* or all objects with a specific surrogate key.

The critical limitation is propagation delay. A purge request must reach every PoP, and each PoP must process it. CloudFront invalidations typically complete in 60-120 seconds but can take up to 10 minutes. During that window, some PoPs serve the old content while others serve the new version. Users in different regions literally see different versions of your site.

Common Pitfall

Purge is not instant. During propagation, different users hit different PoPs and see different versions. Design your system to tolerate this inconsistency window. If you need instant global consistency, purge alone is not sufficient and you should use versioned URLs instead.

Versioned URLs

Versioned URLs sidestep invalidation entirely. Instead of changing the content at a URL and then purging, you change the URL itself:

/static/app.v1.js  ->  /static/app.v2.js
/static/app.a1b2c3.js  ->  /static/app.d4e5f6.js

The old URL remains cached (and harmless) while the new URL is fetched from origin on its first request. This is the gold standard for static assets because it is instant (no propagation delay), atomic (users get either the old or new version, never a partial mix), and requires no CDN API calls.

The trade-off is that you need a mechanism to update references. For web assets, the HTML page references the new filename. For APIs, versioned URLs are harder to apply because clients expect stable endpoints. Another consideration is storage: old versioned files accumulate at the origin. Implement a cleanup policy (e.g., delete assets older than 30 days) to avoid unbounded storage growth.

TTL-based expiry

The simplest strategy is to let TTL handle freshness. Set a TTL that matches your tolerance for staleness:

Static assets with versioned URLs. TTL of one year. Invalidation is never needed.
HTML pages. TTL of 60-300 seconds. Content updates are visible within minutes.
API responses. TTL of 5-60 seconds. Balances freshness with origin load reduction.

The trade-off is straightforward: longer TTLs mean fewer origin requests but more staleness. Shorter TTLs mean fresher content but higher origin load. For most content, TTL-based expiry is the right default because it requires no operational tooling and provides predictable behavior.

stale-while-revalidate and stale-if-error

These directives add resilience to TTL-based expiry:

stale-while-revalidate=N allows serving stale content for N seconds while the edge revalidates in the background. Users never wait.
stale-if-error=N allows serving stale content for N seconds when the origin returns a 5xx error or is unreachable. This turns the CDN into a safety net during origin outages.

Example: Cache-Control: max-age=60, stale-while-revalidate=300, stale-if-error=86400 means fresh for 60s, background-refresh for 5 minutes, and serve stale for up to 24 hours if origin is down. The stale-if-error directive is particularly valuable for high-availability requirements. During an origin outage, users continue to see content (even if slightly outdated) instead of error pages.

Soft purge vs hard purge

Fastly distinguishes between:

Hard purge. Removes the object immediately. The next request is a cache miss.
Soft purge. Marks the object as stale but keeps it in cache. The next request triggers revalidation. If the origin returns 304, the edge re-serves the existing object. If the origin is down, the stale object can still be served.

Soft purge is safer for high-traffic resources because it avoids a thundering herd of cache misses hitting the origin simultaneously. It also provides a fallback if the origin is temporarily unavailable during the purge window.

Choosing the right invalidation strategy

The decision tree is straightforward:

If the content has a unique URL per version (hashed filenames), use long TTLs and no invalidation.
If the content changes predictably on a schedule (hourly reports, daily digests), use TTL-based expiry matched to the schedule.
If the content changes unpredictably but infrequently (CMS pages, product descriptions), use moderate TTLs with stale-while-revalidate and keep purge available for urgent corrections.
If the content changes frequently and must be fresh immediately (news headlines, live scores), use short TTLs or Fastly's instant purge with surrogate keys.

Provider-Specific Highlights and Examples

CDN providers share the same fundamental caching model, but they differ significantly in edge compute capabilities, purge speed, configuration flexibility, and network architecture. Understanding these differences helps you choose the right provider and use it effectively.

CloudFront (AWS)

CloudFront is tightly integrated with AWS services, which makes it the natural choice when your origin is already in AWS.

Cache policies and origin request policies. CloudFront separates what goes into the cache key (cache policy) from what gets forwarded to the origin (origin request policy). This is a clean design that avoids the common mistake of including too many headers in the cache key. You can define reusable policies and attach them to multiple cache behaviors.
Origin Access Control (OAC). Restricts S3 bucket access so only CloudFront can fetch objects. Replaces the older Origin Access Identity (OAI). This is how you lock down the origin. OAC also works with Lambda function URLs and MediaStore.
Lambda@Edge and CloudFront Functions. Lambda@Edge runs Node.js/Python on four event triggers (viewer-request, viewer-response, origin-request, origin-response). CloudFront Functions are lighter and cheaper but limited to viewer-request and viewer-response events and a subset of JavaScript. Use CloudFront Functions for simple tasks like URL rewrites and header manipulation. Use Lambda@Edge for heavier logic like JWT validation or A/B testing.
Invalidation speed. Purge typically propagates in 60-120 seconds. Free tier includes 1,000 invalidation paths per month. Wildcard invalidations (e.g., /images/*) count as one path.
Shield (Origin Shield). An optional mid-tier cache layer. Edge PoPs route misses through a designated shield region before contacting the origin, consolidating miss traffic. Especially valuable when you have many edge locations serving long-tail content.
Real-time logs. CloudFront can stream access logs to Kinesis Data Firehose in near real-time, enabling fast CHR monitoring and anomaly detection.

Cloudflare

Cloudflare operates one of the largest Anycast networks, with PoPs in over 310 cities. Its strength is simplicity and integrated security.

Anycast network. Every PoP announces the same IP addresses via BGP. Users are automatically routed to the closest PoP based on network topology. No regional configuration is needed, which simplifies operations significantly compared to DNS-based routing. This also provides inherent DDoS resilience because attack traffic is automatically distributed across all PoPs rather than concentrated on a single server.
Workers. V8 isolates that run JavaScript/TypeScript/Wasm on every request. Cold start is under 5 ms. Workers can modify requests, implement auth, call external APIs, or generate entire responses at the edge. Workers KV provides a globally distributed key-value store for edge state. Durable Objects add coordination primitives for use cases like rate limiting and session management.
Cache Rules. Declarative rules to override caching behavior by URL pattern, header, or cookie. Can set edge TTL, browser TTL, and cache eligibility without code. Rules are evaluated in order, and the first match wins.
Tiered caching. Reduces origin load by routing cache misses through a smaller set of upper-tier PoPs before hitting the origin. This improves CHR for long-tail content that is requested infrequently at any single PoP.
Argo Smart Routing. Paid feature that routes requests through Cloudflare's private backbone to find the fastest path to the origin, reducing latency for cache misses.

Fastly

Fastly differentiates on real-time purging and fine-grained cache control through VCL (Varnish Configuration Language).

Instant purge. Global purge propagation in approximately 150 ms. This is an order of magnitude faster than CloudFront and Cloudflare, making Fastly attractive for content that changes frequently and must be fresh immediately (news sites, live scores, stock prices).
Surrogate keys. You tag cached objects with custom keys (e.g., product-123, category-shoes). A single purge API call can invalidate all objects tagged with a specific surrogate key. This is far more flexible than path-based purging because one logical entity can have many cached URLs.
VCL. Fastly exposes the full Varnish configuration language for cache logic. You can write complex rules for cache key construction, TTL overrides, and request routing. Powerful but has a steeper learning curve than declarative rules.
Compute@Edge. Fastly's edge compute platform running Wasm. Supports Rust, Go, and JavaScript. Designed for more complex logic than VCL can express.
Shielding. Fastly lets you designate specific PoPs as shield nodes. Cache misses at edge PoPs go to the shield first instead of the origin, consolidating origin traffic. This is similar to CloudFront's origin shield and Cloudflare's tiered caching.

Provider comparison

Feature	CloudFront	Cloudflare	Fastly
Purge speed	60-120 seconds	Seconds to minutes	About 150 ms globally
Edge compute	Lambda@Edge, CF Functions	Workers (V8 isolates)	Compute@Edge (Wasm)
Cache key control	Cache policies	Cache Rules	VCL
Tag-based purge	No native support	Cache Tags (Enterprise)	Surrogate keys
Origin lockdown	OAC for S3	Authenticated origin pulls	Shielding + headers
Network	600+ edge locations	310+ cities, Anycast	80+ PoPs, Anycast

Interview Tip

Choose your CDN based on the specific capability that matters most for your workload. If you need instant purge, evaluate Fastly. If you need deep AWS integration, CloudFront is the natural fit. If you want a broad Anycast network with easy Workers, Cloudflare is strong.

Best Practices and Summary

This section pulls together the key principles from the entire lesson into an actionable checklist. These practices apply regardless of which CDN provider you choose.

Build a cache-friendly architecture

The most impactful thing you can do for CDN performance is design your content to be cacheable from the start:

Immutable static assets. Use content-hashed filenames for JS, CSS, images, and fonts. Set Cache-Control: public, max-age=31536000, immutable. Never purge these files because the URL changes with every deploy.
Versioned URLs over purging. Whenever possible, change the URL instead of purging the old one. This eliminates propagation delay and thundering-herd risk.
Separate cacheable from uncacheable. Serve public content from a CDN-friendly domain (e.g., cdn.example.com) and route user-specific API calls through a different path or domain. This prevents cookies and auth headers from accidentally making static content uncacheable.
Minimize Vary header usage. Only vary on headers that genuinely change the response. Accept-Encoding is almost always needed. Beyond that, prefer custom normalized headers over raw browser headers.

Choose the right TTL for each content type

There is no single correct TTL. Match TTL to the content's staleness tolerance:

Static assets (JS, CSS, images). One year with versioned filenames. Effectively infinite caching with zero invalidation cost.
HTML pages. 60-300 seconds with stale-while-revalidate for another 5-10 minutes. Users see near-fresh content without ever waiting on the origin.
API responses. 5-60 seconds for public read-only endpoints. Even a 10-second TTL on an endpoint receiving 1,000 requests per second reduces origin load from 1,000 to 1 request per second.
Private or personalized content. Cache-Control: private or no-store. Do not cache at shared edges. Be explicit about this to prevent accidental caching of sensitive data.

Monitor relentlessly

You cannot improve what you do not measure. The essential CDN metrics are:

Cache hit ratio by content type. Track CHR separately for HTML, assets, and API responses. A 95% overall CHR can mask a 20% API CHR that is hammering the origin.
Origin load. Monitor origin requests per second segmented by cache status (miss, expired, bypass). A rising miss rate after a deploy suggests a cache key or header change broke caching.
Error rates. Track 4xx and 5xx rates at the edge. A spike in 403s might indicate expired signed URLs. A spike in 5xx with stale-if-error could mean the origin is down but users are still being served.
Latency percentiles. Track p50, p95, and p99 at the edge. A gap between p50 and p99 often indicates that cache misses are driving tail latency.
Bandwidth savings. Compare CDN-served bytes against origin-served bytes. This directly translates to cost savings and origin capacity headroom.

Secure the origin

The CDN is only as secure as the origin behind it:

Origin access control. Use CloudFront OAC, Cloudflare authenticated origin pulls, or IP allowlists so the origin only accepts CDN traffic.
Signed URLs for private content. Short-lived tokens, HTTPS only, and IP binding when practical.
WAF and rate limiting at the edge. Let the CDN absorb and filter malicious traffic before it reaches your infrastructure.
Cache poisoning prevention. Audit which headers and parameters affect origin responses and ensure they are part of the cache key or stripped at the edge.

Common pitfalls to avoid

These mistakes appear frequently in production CDN configurations:

Forgetting to set Cache-Control. If the origin does not send cache headers, the CDN uses a default TTL (often 24 hours for CloudFront) or does not cache at all. Always be explicit.
Caching Set-Cookie responses. If the origin includes a Set-Cookie header, some CDNs will not cache the response. Others will cache it and serve the same cookie to all users. Strip Set-Cookie on cacheable responses.
Using purge as the primary freshness strategy. Purge should be a safety valve, not the main mechanism. Rely on TTLs and versioned URLs for routine freshness.
Ignoring the shield tier. Without a shield, every PoP sends its own cache misses to the origin. With 200 PoPs and a 60-second TTL, that is up to 200 origin requests per minute for a single resource. A shield consolidates these into one.

Summary of key decisions

When designing a CDN caching strategy, you are making five core decisions: what to cache (Cache-Control headers), how to identify cached objects (cache key design), how long to cache (TTL strategy), how to remove stale content (invalidation strategy), and who can access cached content (signed URLs, cookies, edge auth). Getting these five decisions right covers the vast majority of CDN-related system design questions. In an interview, walking through these five decisions systematically shows the interviewer you understand both the mechanics and the trade-offs.

Remember that CDNs are not only for static content. Even dynamic API responses benefit from short-lived edge caching. A 10-second TTL on a popular endpoint can reduce origin load by orders of magnitude. The key is always the same: understand the freshness requirements of each content type and choose the caching strategy that matches.

Course

System Design Fundamentals

Networking & APIs

Storage & Data Modeling

Partitioning, Replication & Consistency

Caching & Edge

Messaging & Streaming

Reliability & Operability

Security & Privacy

Common Interview Scenarios