1. Requirements

Functional Requirements

Generate short URL for a given long URL
Redirect short URL → original URL
Optional:
- Custom aliases
- Expiration time
- Analytics (click count)

Non-Functional Requirements

Latency: < 50ms redirect
Throughput: High read-heavy system (~100:1 read/write)
Scalability: Billions of URLs
Availability: Very high (redirect must not fail)
Consistency: Eventual consistency acceptable for analytics

2. Estimations

Traffic

Writes: ~10K/sec
Reads (redirects): ~1M/sec

Storage

Assume:

100B URLs over time
Each record ~200 bytes

👉 Total ≈ 20 TB → requires distributed storage

Read/Write Pattern

Highly read-heavy
Requires aggressive caching

3. API Design

3.1 Create Short URL

POST /shorten

Request:

{
"long_url": "https://example.com/abc",
"custom_alias": "optional",
"expiry": "optional"
}

Response:

{
"short_url": "https://sho.rt/xyz123"
}

3.2 Redirect

GET /{short_code}

Response:

HTTP 302 → redirect to long URL

3.3 Analytics (Optional)

GET /analytics/{short_code}

4. Data Storage & Design

4.1 Database (Primary Store)

Use:

Distributed KV store (e.g. Amazon DynamoDB / Cassandra)

Schema:

short_code (PK)
long_url
created_at
expiry
user_id

4.2 Cache (Critical)

Use:

Redis

Key:

short_code → long_url

5. High-Level Architecture (HLD)

System divided into:

5.1 Data Plane (Redirect Flow)

Client hits short URL
Request goes to CDN
CDN checks cache:
- If hit → redirect
- If miss → go to backend
Backend:
- Check Redis cache
- If miss → fetch from DB
Return 302 redirect

5.2 Control Plane (Short URL Creation)

Client sends long URL
Service generates short code
Store mapping in DB
Cache in Redis

6. Detailed Breakdown

6.1 ID Generation (Make This Precise)

6.1 Short Code Generation (Robust Strategy)

Approach: Distributed Counter + Base62 Encoding

Step 1: Unique ID Generation

Use:

Centralized ID service OR
Distributed ID generator (e.g., Snowflake-like)

Properties:

Globally unique
Monotonically increasing
No collisions

Step 2: Base62 Encoding

ID → Base62 → short_code

Characters: a–z, A–Z, 0–9
Compact representation

Example:

123456789 → "8M0kX"

Why This Works:

No collisions
Predictable length
Efficient storage

Optional Enhancement:

Add randomness to prevent predictability (security)

6.2 Read Optimization (Critical)

CDN caching (edge)
Redis caching (hot data)

👉 Most requests never hit DB

6.3 Write Path

Generate ID
Encode → short code
Store in DB
Populate cache

6.4 Scaling Strategy

Stateless App Servers

No local state
Horizontally scalable

DB Scaling

Sharding by:

hash(short_code)

Cache Scaling

Redis cluster

6.5 Hot Key Handling (Critical for Viral Traffic)

Problem:

A single short URL (e.g., viral link) can receive millions of requests/sec
Can overload:
- Redis
- Backend
- DB

Multi-Layer Solution:

1. CDN Caching (Primary Defense)

Popular URLs cached at edge
Majority of requests never reach backend

2. Redis Replication + Read Scaling

Use replicated Redis nodes
Hot keys served from replicas

3. Request Coalescing (Important)

Multiple simultaneous cache misses:

Only 1 request hits DB
Others wait or reuse result

4. Local In-Memory Cache (Side Optimization)

App servers cache hottest keys temporarily

5. Hot Key Detection

Track:

Access frequency per key

If threshold exceeded:

Promote to “hot tier” cache

6. Dedicated Hot Key Tier (Advanced)

Store extremely popular URLs in:
- Separate Redis cluster OR
- In-memory distributed cache

👉 Ensures:

No single key overwhelms system
Scales even for viral traffic

7. Additional Considerations

7.1 Burst Traffic Handling

CDN absorbs sudden spikes
Redis handles high QPS
Backend protected from overload

7.2 Cache Miss Handling

Redis miss → DB lookup
Populate cache after read

7.3 Degraded Mode

Case 1: Redis Down

Direct DB reads (higher latency)

Case 2: DB Down

Serve from cache if available

Case 3: Full Failure

Graceful degradation (temporary errors)

7.4 Expiration Handling

TTL on DB + Redis
Background cleanup jobs

7.5 Analytics

Async logging
Stream processing (Kafka)

7.6 Security

Prevent abuse:
- Rate limiting
- Spam detection

Failure Handling & Recovery

Case 1: Redis Failure

Fallback to DB reads
Increased latency but system continues

Mitigation:

Redis replication + failover

Case 2: Database Failure

Serve from:
- CDN cache
- Redis cache

If cache miss:

Return graceful error

Case 3: CDN Failure

Requests hit backend directly
Redis absorbs increased load

Case 4: App Server Failure

Stateless → replaced automatically
Load balancer reroutes traffic

Case 5: Partial Network Failure

Retry with exponential backoff
Circuit breaker to prevent cascading failures

Recovery Strategy

Automatic failover (Redis / DB replicas)
Health checks + auto-scaling

Observability

Track:

Cache hit ratio
DB latency
Error rate

Tools:

Prometheus
Grafana

TTL & Data Lifecycle Management

Problem:

Billions of URLs → unbounded growth

Solution:

1. Expiry at Creation

Optional TTL per URL

2. TTL in Storage

Redis:
- Native TTL support
DB:
- Store expiry timestamp

3. Background Cleanup Jobs

Periodic jobs delete expired entries
Prevents storage bloat

4. Cold Data Archival (Optional)

Move inactive URLs to cheaper storage

5. Cache Eviction Policy

Redis uses:
- LRU / LFU eviction

👉 Ensures:

Controlled storage growth
Efficient memory usage

🏁 Final Summary

CDN + Redis caching ensures ultra-low latency
Base62 encoding ensures compact unique URLs
Distributed DB + sharding ensures scalability
Stateless services ensure horizontal scaling
Multi-layer caching handles extreme read load