Design a URL Shortening Service (TinyURL)
1. Requirements
Functional
- Given a long URL, generate a unique short URL.
- Accessing the short URL redirects to the original long URL.
- Optional: custom alias, link expiration, click analytics.
Non-Functional
- High availability (a down service means broken links everywhere).
- Low-latency redirects.
- Short codes should be hard to guess (security).
- Read-heavy system (reads ≫ writes).
2. Capacity Estimation
- Assume 100M new links/month, read:write ratio ≈ 100:1.
- Writes: 100M/month ≈ 40/sec. Reads: ≈ 4000/sec.
- Storage: ~500 bytes/record, 5 years ≈ 6B records ≈ 3 TB.
- Short code length: Base62 (a-z, A-Z, 0-9), 7 chars = 62⁷ ≈ 3.5 trillion combinations — enough for many years.
3. API Design
POST /shorten body: { longUrl, customAlias?, expireAt? } -> { shortUrl }
GET /{shortCode} -> HTTP 301/302 redirect to longUrl
Redirect choice: 301 (permanent) is cached by browsers and offloads the server, but you lose click tracking. 302 (temporary) hits the server every time, enabling analytics. Trade-off worth stating explicitly.
4. Data Model
url_mapping( short_code PK, long_url, created_at, expire_at, user_id )
Access pattern is a key-value point lookup (short_code → long_url), so a NoSQL store (DynamoDB/Cassandra) or a relational DB with caching both work well.
5. Core: Short Code Generation
Three approaches, with trade-offs:
- Hashing (MD5/SHA + truncate): simple, but causes collisions requiring retry/detection; identical long URLs map to the same code (can be a pro or con).
- Auto-increment ID + Base62: no collisions, compact codes. Downsides: codes are guessable/enumerable (security risk) and you need a globally unique ID generator.
- ID-generation service (chosen): a dedicated service (Snowflake, or Redis/Zookeeper-managed ID ranges) hands out IDs in batches; app servers grab a range and Base62-encode locally. Avoids a single-point bottleneck and has no collisions. I add reversible obfuscation on the ID before encoding to defeat enumeration.
6. Component Deep-Dive
ID Generator — Use range pre-allocation: each app server fetches a block of 1000 IDs from Redis/Zookeeper at a time and serves them locally, fetching a new block when exhausted. Most ID issuance happens in-memory, so the generator is not hit per request and the QPS bottleneck disappears. Cost: a restart wastes a small ID range (acceptable). Versus Snowflake: no central coordination needed, but depends on clock sync.
Cache Layer (Redis) — Caches short_code → long_url. In a read-heavy workload, hit rate reaches 95%+. Eviction: LRU, since link access follows a hot/long-tail distribution. Handle cache penetration (lookups for non-existent codes) with a Bloom filter or by caching null values.
Database — Read-heavy, so use primary-replica replication to spread reads across read replicas. At scale, shard by short_code hash (see below).
7. Horizontal Scalability
- Stateless app tier: all state lives in cache/DB, so we scale linearly by adding machines behind a load balancer.
- Database sharding: consistent-hash shard by
short_code, each shard with read replicas. Since every query is a point lookup by short_code, this key guarantees single-shard hits with no cross-shard joins. - Distributed cache: Redis Cluster sharded by key, add nodes as read traffic grows.
- Multi-region + CDN/GeoDNS: redirects are reads, so place read replicas and caches in multiple regions to serve requests closer to users and cut latency.
8. Security & Abuse Prevention
- Unguessable codes: pure auto-increment Base62 is enumerable (scraping others' links). Fix: apply reversible obfuscation (Feistel/XOR permutation) to the ID before encoding, or use sufficiently long random codes with a uniqueness check.
- Rate limiting: token-bucket per IP / API key on link creation to prevent flooding.
- Malicious URL protection: filter target URLs against a blocklist / Safe Browsing API at creation time to stop the service being used for phishing/malware redirects.
- Transport & ownership: HTTPS end-to-end; validate ownership of custom aliases to prevent squatting/collisions.
9. Additional Considerations
- Expired link cleanup via scheduled jobs or lazy deletion.
- Click analytics pushed through an async message queue (Kafka) so they don't slow down the redirect path.
- Monitoring and alerting on latency, cache hit rate, and error rates.