Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Redirect to long URL associated with a given short URL.
- Optional: Custom Aliases, Expiration time
Non-Functional Requirements:
- Redirect latency should be low < 50ms
- very high read throughput
- highly available (redirect must not fail)
- highly durable(no data loss)
- Strong consistency for URL mapping.
- Eventual consistency for analytics.
Core Desgin Decisions:
- ID Generation Strategy: Auto increment, Random, Base62 encoding of chosen uniqueId
- Read Optimization: Cache(Redis)
- Storage: key-value store
- Redirection speed: CDN + cache
Back of the Envelope Estimation:
- Assume 10M new URLs per day
- Read: write = 100:1
- QPS: write: 10M/(24*3600) = 115 writes/sec. Read = 115*100 = 11,500 reads/sec
- Storage: Assume avg URL length = 500 bytes. metadata = 100 bytes, total = 600bytes. 10M/day -> 6GB/day for 1year = 2.2TB
- 20% of those URLs in cache = 400GB for Cache storage
- CAP -> choose Availability + Partition tolerance
API Design
Create Short URL: Post/shorten
Redirect URL: Get/{shortcode}
Analytics URL: Get/stats/{shortcode}
Cleanup URL: Post/cleanup
High-Level Design
URL Shorten Service - generates short code
Redirect Service - Handles lookups and redirects
Analytics Service - async processing
ID generator Service - URL Shorten Service uses this to generate the unique id
Cleanup Service - cleansup the old expired urls from DB
DATABASE choice: key-value DB(DynamoDB), if we choose SQL, scaling writes is hard, sharding is complex.
Schema: table: URLMapping
shortcode (pk)
longurl
createdAt
expiryAt
userId(optional)
Redis : key-shortCode value-longURL. TTL can be defined and LRU eviction policy.
Data flow:
URL Creation flow:
- User -> API(Url shorten service)
- Generate unique ID
- Convert to Base62 ->shortcode
- Store in DB after storing in cache(write through cache)
- Return the short URL
Redirect flow:
- User hits short URL
- checks cache
- Hit -> redirect
- Miss -> DB Lookup -> Cache -> redirect
Detailed Component Design
Short URL Generation:
Approach1: Random string
. Generate a random 6-8 chars
. problem: collisions -> retry loops
Approach2: Hashing(MD5)
. Hash long URL
. Take first N chars
. Problem: collision risk
Approach3: Counter + Base62
. Generate a unique ID eveytime
. Convert to Base62:
. Why is it best: No Collisions, Deterministic, Compact
. Problem: Distributed ID generation
. use pre allocated ID ranges per server.
Scalability & Reliability:
Bottlenecks and solutions:
- DB overload: Use Read replicas and sharding
- Cache misses: use correct eviction policy(LRU)
- Hot URL: use CDN caching
- ID generation bottleneck: Use Distributed ID generator
- Traffic spikes: use Auto scaling, and Rate limiting
- Request failures: use exponential backoffs
Security Measures:
- Rate limiting at API gateway
- Captcha for suspicious traffic
- URL validation (avoids malware)
Backup & Disaster Recovery:
- Regular DB snapshots
- Multi region replication
Logging & Monitoring:
- Request Logs, error logs
- Monitor QPS, Latency, Cache hit ratio
- Tools: Grafana, datadog