Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- High Availability: The system must be available 24/7 (e.g., 99.99%).
- Low Latency: Redirects must occur in less than 100 ms.
- Read-Heavy: The read-to-write ratio is approximately 100:1.
- Horizontal Scalability: The system must support horizontal scaling to handle growing data volumes (billions of records) and traffic without compromising performance.
API Design
POST /api/v1/shorten — Create a link. Accepts long_url, returns short_url.
GET /{short_url_id} — Redirect. Returns a 302 Found status and a Location header.
High-Level Design
Entry Layer: We use an API Gateway and a Load Balancer to accept requests from clients, handle SSL certificates, and distribute traffic to the application servers.
Application Services:
Write Service: Creates short links, interacts with the ID Generator, and stores data in the database.
Read Service: Processes redirects, first checking Redis and then the DB.
Storage Layer: We use NoSQL (Cassandra/DynamoDB) for horizontal scaling and high-speed storage of billions of links.
Detailed Component Design
Key Generation: We use Base62 encoding. For a 7-character ID, we get 62^7 (about 3.5 trillion) unique combinations. Caching: Using Redis to cache “hot” links. Since 20% of links typically generate 80% of traffic, the cache will significantly reduce the load on the database.
ID Collision & Concurrency: To avoid collisions under high load, we use a Distributed ID Generator (such as Snowflake ID or the KGS—Key Generation Service), which reserves key ranges in server memory.
Fault Tolerance: If one key generator fails, the servers use pre-reserved ID blocks from memory, allowing the system to operate autonomously.
Cache Strategy: We use the LRU (Least Recently Used) policy to remove old keys. To protect against “Cache Stampede,” we set a random TTL (Time-to-Live) for each entry.
Rate Limiting: We implement the Token Bucket algorithm at the API Gateway level to limit the number of requests from a single IP address and protect the system from abuse.