System Requirements
Functional:
Non-Functional Requirements
1. Low Query Latency:
The system should resolve a short URL and redirect the user to the original URL with minimal delay (typically <100 ms). This is important because URL shorteners are user-facing and delays degrade user experience.
Achieved by using CDN for edge caching, in-memory cache (Redis) for fast lookups, and efficient key-value database access.
2. Horizontal Scalability:
The system must handle increasing traffic (both reads and writes) and growing data volume over time. Since the system is read-heavy and may handle tens of thousands of requests per second, it should scale horizontally.
Achieved by using stateless services, load balancers, distributed NoSQL databases, and caching layers.
3. High Availability:
The system should be highly available so that short URLs are always accessible, even during failures.
Achieved by deploying services across multiple availability zones, using replication in databases, and failover mechanisms.
4. Fault Tolerance:
The system should continue functioning even if some components fail (e.g., server crash, database node failure).
Achieved by redundancy, retries, replication, and fallback mechanisms such as serving from cache when database is unavailable.
5. Efficient Storage of URL Mappings:
The system should store billions of URL mappings efficiently without excessive storage overhead.
Achieved by using compact data models, short encoded keys (Base62), and NoSQL databases optimized for large-scale storage.
Capacity Estimation
Traffic Estimation
Write Requests
Assume a mid-size URL shortener generates:
Write QPS = 200 URL creations/sec
Daily URL creations
200 * 86,400 seconds
= 17,280,000
≈ 17M new URLs/day
Monthly URL creations
17M * 30
≈ 510M URLs/month
Read Requests
Assume read : write ratio = 100 : 1
Read QPS
200 * 100
= 20,000 redirects/sec
Daily redirect requests
20,000 * 86,400
= 1,728,000,000
≈ 1.7B redirects/day
Peak Traffic
Assume 10x traffic spike
Peak Writes
10 * 200
= 2,000 writes/sec
Peak Reads
10 * 20,000
= 200,000 reads/sec
Storage Estimation
Short URL = 7 bytes
Long URL = 100 bytes
Created Timestamp = 10 bytes
Expiration Timestamp = 10 bytes
UserId (optional) = 20 bytes
Click Count = 8 bytes
Total per record
7 + 100 + 10 + 10 + 20 + 8
= 155 bytes
≈ 160 bytes
Daily Storage
17M URLs * 160 bytes
= 2,720,000,000 bytes
≈ 2.7 GB/day
Monthly Storage
2.7 GB * 30
≈ 81 GB/month
Yearly Storage
81 GB * 12
≈ 972 GB
≈ 1 TB/year
Twenty-year Storage
1 TB * 20
= 20 TB
Replication Factor = 3
Actual storage required
3 * 20 TB
= 60 TB
Bandwidth Estimation
Write Request
Request Size = 500 bytes
Response Size = 200 bytes
Total per write request = 700 bytes
Write QPS = 200
Peak Write QPS = 2000
Average write bandwidth
200 * 700
= 140,000 bytes/sec
≈ 137 KB/sec
≈ 0.13 MB/sec
Peak write bandwidth
2000 * 700
= 1,400,000 bytes/sec
≈ 1.34 MB/sec
Read Request (Redirect)
Total per redirect (request + response)
≈ 1 KB
Read QPS = 20,000
Peak Read QPS = 200,000
Average read bandwidth
20,000 * 1 KB
= 20,000 KB/sec
≈ 19.5 MB/sec
Peak read bandwidth
200,000 * 1 KB
= 200,000 KB/sec
≈ 195 MB/sec
Total Bandwidth
Average bandwidth
Write ≈ 0.13 MB/sec
Read ≈ 19.5 MB/sec
--------------------------------
Total ≈ 19.6 MB/sec
Peak bandwidth
Write ≈ 1.34 MB/sec
Read ≈ 195 MB/sec
--------------------------------
Total ≈ 196 MB/sec
Cache Size
The access pattern is highly skewed.
A small fraction of URLs (viral links, popular content) generate most of the redirect traffic.
Using the 80/20 rule:
20% of URLs generate 80% of traffic.
So cache the 20% of the most frequently accessed URLs.
Daily URLs created ≈ 17M
0.2 * 17M
= 3.4M hot URLs per day
Instead of caching the entire historical dataset, we cache recently active URLs.
Assume we cache the last 30 days of hot URLs.
Total cached URLs
3.4M * 30
= 102M URLs
Assume each cache entry stores:
Short URL
Long URL
Metadata
Average entry size ≈ 256 bytes
Total cache memory required
102M * 256 bytes
= 26,112,000,000 bytes
≈ 26 GB
Accounting for Redis overhead, replication, and future traffic growth:
Recommended cache cluster size
≈ 80 GB – 120 GB
API Design
1. Short URL Creation API
POST /api/shorten
Request Body
{
"longUrl": "https://example.com/page",
"expirationTime": "2026-04-12T00:00:00Z"
}
Response
HTTP/1.1 201 Created
Response Body
{
"shortUrl": "https://short.ly/ab12",
"expirationTime": "2026-04-12T00:00:00Z"
}
2. Redirect API
GET /{shortCode}
Example
GET /ab12
Response
HTTP/1.1 302 Found
Location: https://example.com/page
The browser automatically redirects to the URL specified in the Location header.
Explanation:
The redirect API returns HTTP 302 with the Location header containing the original URL.
The browser automatically redirects the user to that URL.
Returns a 302 (Found) response with the Location header set to the original long URL. The browser follows the redirect automatically. No authentication required. Anyone with the short URL can follow it. This is intentional. Short URLs are shared publicly and must work for everyone who clicks them.
If the short code does not exist or has expired, it returns 404 (Not Found) with a JSON body explaining the error. For expired links, include a message indicating the link has expired rather than simply saying "not found." This helps users understand what happened and reduces confusion.
Rate Limiting:
We need to add rate limiting to prevent any malicious URL from a single client from overloading the system with too many requests within a specific time window. Also, block that user from creating too many URLs. Also, rate limiting stops multiple URL creations from the same client by restricting how many times a client can call the URL creation API within a time window.
Database Design:
Database Choice: NoSQL Key-Value Store (e.g., DynamoDB / Cosmos DB)
We will use a key-value NoSQL database.
Reason:
The system primarily stores a simple mapping between short URLs and long URLs (short_code → long_url). The access pattern is a direct key-based lookup, and there is no requirement for joins or complex relational queries. Therefore, a relational database is not necessary.
The system is highly read-heavy and needs to handle high throughput (20K+ requests per second, and even higher during peak traffic). A NoSQL database is better suited for such workloads.
Additionally, the system will store a large volume of data over time, requiring horizontal scalability. We need features such as auto-scaling, partitioning, high availability, and fault tolerance without significant manual intervention.
Managed NoSQL databases like Amazon DynamoDB or Azure Cosmos DB provide these capabilities out of the box, including automatic scaling, built-in replication, and low-latency key-value access.
Hence, a distributed NoSQL database is the most suitable choice for this system.
Why not PostgreSQL:
PostgreSQL is a relational database and works well for structured data with complex relationships. However, in this system, the workload is highly read-heavy and primarily consists of simple key-value lookups (short_code → long_url), without the need for joins or complex queries.
At the given scale (tens of thousands of reads per second and billions of records), PostgreSQL would require significant manual effort to scale horizontally. This includes implementing sharding, managing partitions, handling replication, and ensuring high availability, which increases operational complexity.
In contrast, a distributed NoSQL database provides built-in horizontal scalability, high availability, and efficient key-based access with minimal operational overhead, making it a better fit for this use case.
Schema:
Table -> URLMapping
Primary_Key:
short_url (Partition Key)
Attributes:
long_url (string)
created_at (timestamp)
expiration_at (timestamp)
click_count (number)
user_id (string, optional)
You should identify enough components that are needed to solve the actual problem from end to end. Also, remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chatbot and ask
it to generate a starter diagram for you to modify...
We will create a single service for both read and write:
Shortening service
We will have this flow to read and write:
Read: request -> cdn -> API Getway-> url shortening service -> cache -> casandraDB
Write: Read: request -> API Getway-> url shortening service -> cache -> casandraDB
Client Request:
Represents the user's action of requesting a short URL or redirectiing to a original URL.
CDN:
Handles the read requests and helps to reduce traffic load on the API getway and service.
API Getway:
Servs as the entry point for api requests, direction to the respective services.
URL Shortening Service:
The main component of the system that process the incoming url shotness create and redirect request.
Cache:
We will use this to cache the frequently accessed do to reduce laod and db and make the response faster.
CassandraDB: The persistant storage solution that holds all the mappings between short URLs and their corresponding long URLs.
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Identifier Generation:
Data Storage Layer:
Caching Layer:
Rate Limiter:
We will use a single table for the URL shortener because we have a limited table requirement.
CREATE TABLE url_mappings (
short_code text PRIMARY KEY,
long_url text,
created_at timestamp,
expires_at timestamp,
user_id text,
click_count counter
);