Designing A Simple Url Shortening Service A TinyURL Approach - System Design

System Requirements

Functional:

To shorten the URL given, a long URL should return a short URL.
To redirect a given short URL, the system should redirect to the original URL.
Expiration functionality: The system should support setting a TTL for short URL expiration.
Alias capability (optional): Have the capability to choose a short URL code by the user.

Non-Functional Requirements

1. Low Query Latency:

The system should resolve a short URL and redirect the user to the original URL with minimal delay (typically <100 ms). This is important because URL shorteners are user-facing and delays degrade user experience.

Achieved by using CDN for edge caching, in-memory cache (Redis) for fast lookups, and efficient key-value database access.

2. Horizontal Scalability:

The system must handle increasing traffic (both reads and writes) and growing data volume over time. Since the system is read-heavy and may handle tens of thousands of requests per second, it should scale horizontally.

Achieved by using stateless services, load balancers, distributed NoSQL databases, and caching layers.

3. High Availability:

The system should be highly available so that short URLs are always accessible, even during failures.

Achieved by deploying services across multiple availability zones, using replication in databases, and failover mechanisms.

4. Fault Tolerance:

The system should continue functioning even if some components fail (e.g., server crash, database node failure).

Achieved by redundancy, retries, replication, and fallback mechanisms such as serving from cache when database is unavailable.

5. Efficient Storage of URL Mappings:

The system should store billions of URL mappings efficiently without excessive storage overhead.

Achieved by using compact data models, short encoded keys (Base62), and NoSQL databases optimized for large-scale storage.

Capacity Estimation

Traffic Estimation

Write Requests

Assume a mid-size URL shortener generates:

Write QPS = 200 URL creations/sec

Daily URL creations

200 * 86,400 seconds

= 17,280,000

≈ 17M new URLs/day

Monthly URL creations

17M * 30

≈ 510M URLs/month

Read Requests

Assume read : write ratio = 100 : 1

Read QPS

200 * 100

= 20,000 redirects/sec

Daily redirect requests

20,000 * 86,400

= 1,728,000,000

≈ 1.7B redirects/day

Peak Traffic

Assume 10x traffic spike

Peak Writes

10 * 200

= 2,000 writes/sec

Peak Reads

10 * 20,000

= 200,000 reads/sec

Storage Estimation

Short URL = 7 bytes

Long URL = 100 bytes

Created Timestamp = 10 bytes

Expiration Timestamp = 10 bytes

UserId (optional) = 20 bytes

Click Count = 8 bytes

Total per record

7 + 100 + 10 + 10 + 20 + 8

= 155 bytes

≈ 160 bytes

Daily Storage

17M URLs * 160 bytes

= 2,720,000,000 bytes

≈ 2.7 GB/day

Monthly Storage

2.7 GB * 30

≈ 81 GB/month

Yearly Storage

81 GB * 12

≈ 972 GB

≈ 1 TB/year

Twenty-year Storage

1 TB * 20

= 20 TB

Replication Factor = 3

Actual storage required

3 * 20 TB

= 60 TB

Bandwidth Estimation

Write Request

Request Size = 500 bytes

Response Size = 200 bytes

Total per write request = 700 bytes

Write QPS = 200

Peak Write QPS = 2000

Average write bandwidth

200 * 700

= 140,000 bytes/sec

≈ 137 KB/sec

≈ 0.13 MB/sec

Peak write bandwidth

2000 * 700

= 1,400,000 bytes/sec

≈ 1.34 MB/sec

Read Request (Redirect)

Total per redirect (request + response)

≈ 1 KB

Read QPS = 20,000

Peak Read QPS = 200,000

Average read bandwidth

20,000 * 1 KB

= 20,000 KB/sec

≈ 19.5 MB/sec

Peak read bandwidth

200,000 * 1 KB

= 200,000 KB/sec

≈ 195 MB/sec

Total Bandwidth

Average bandwidth

Write ≈ 0.13 MB/sec

Read ≈ 19.5 MB/sec

--------------------------------

Total ≈ 19.6 MB/sec

Peak bandwidth

Write ≈ 1.34 MB/sec

Read ≈ 195 MB/sec

--------------------------------

Total ≈ 196 MB/sec

Cache Size

The access pattern is highly skewed.

A small fraction of URLs (viral links, popular content) generate most of the redirect traffic.

Using the 80/20 rule:

20% of URLs generate 80% of traffic.

So cache the 20% of the most frequently accessed URLs.

Daily URLs created ≈ 17M

0.2 * 17M

= 3.4M hot URLs per day

Instead of caching the entire historical dataset, we cache recently active URLs.

Assume we cache the last 30 days of hot URLs.

Total cached URLs

3.4M * 30

= 102M URLs

Assume each cache entry stores:

Short URL

Long URL

Metadata

Average entry size ≈ 256 bytes

Total cache memory required

102M * 256 bytes

= 26,112,000,000 bytes

≈ 26 GB

Accounting for Redis overhead, replication, and future traffic growth:

Recommended cache cluster size

≈ 80 GB – 120 GB

API Design

1. Short URL Creation API

POST /api/shorten

Request Body

{

"longUrl": "https://example.com/page",

"expirationTime": "2026-04-12T00:00:00Z"

}

Response

HTTP/1.1 201 Created

Response Body

{

"shortUrl": "https://short.ly/ab12",

"expirationTime": "2026-04-12T00:00:00Z"

}

2. Redirect API

GET /{shortCode}

Example

GET /ab12

Response

HTTP/1.1 302 Found

Location: https://example.com/page

The browser automatically redirects to the URL specified in the Location header.

Explanation:

The redirect API returns HTTP 302 with the Location header containing the original URL.

The browser automatically redirects the user to that URL.

Returns a 302 (Found) response with the Location header set to the original long URL. The browser follows the redirect automatically. No authentication required. Anyone with the short URL can follow it. This is intentional. Short URLs are shared publicly and must work for everyone who clicks them.

If the short code does not exist or has expired, it returns 404 (Not Found) with a JSON body explaining the error. For expired links, include a message indicating the link has expired rather than simply saying "not found." This helps users understand what happened and reduces confusion.

Rate Limiting:

We need to add rate limiting to prevent any malicious URL from a single client from overloading the system with too many requests within a specific time window. Also, block that user from creating too many URLs. Also, rate limiting stops multiple URL creations from the same client by restricting how many times a client can call the URL creation API within a time window.

Database Design:

Database Choice: NoSQL Key-Value Store (e.g., DynamoDB / Cosmos DB)

We will use a key-value NoSQL database.

Reason:

The system primarily stores a simple mapping between short URLs and long URLs (short_code → long_url). The access pattern is a direct key-based lookup, and there is no requirement for joins or complex relational queries. Therefore, a relational database is not necessary.

The system is highly read-heavy and needs to handle high throughput (20K+ requests per second, and even higher during peak traffic). A NoSQL database is better suited for such workloads.

Additionally, the system will store a large volume of data over time, requiring horizontal scalability. We need features such as auto-scaling, partitioning, high availability, and fault tolerance without significant manual intervention.

Managed NoSQL databases like Amazon DynamoDB or Azure Cosmos DB provide these capabilities out of the box, including automatic scaling, built-in replication, and low-latency key-value access.

Hence, a distributed NoSQL database is the most suitable choice for this system.

Why not PostgreSQL:

PostgreSQL is a relational database and works well for structured data with complex relationships. However, in this system, the workload is highly read-heavy and primarily consists of simple key-value lookups (short_code → long_url), without the need for joins or complex queries.

At the given scale (tens of thousands of reads per second and billions of records), PostgreSQL would require significant manual effort to scale horizontally. This includes implementing sharding, managing partitions, handling replication, and ensuring high availability, which increases operational complexity.

In contrast, a distributed NoSQL database provides built-in horizontal scalability, high availability, and efficient key-based access with minimal operational overhead, making it a better fit for this use case.

Schema:

Table -> URLMapping

Primary_Key:

short_url (Partition Key)

Attributes:

long_url (string)

created_at (timestamp)

expiration_at (timestamp)

click_count (number)

user_id (string, optional)

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also, remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chatbot and ask

it to generate a starter diagram for you to modify...

We will create a single service for both read and write:

Shortening service

We will have this flow to read and write:

Read: request -> cdn -> API Getway-> url shortening service -> cache -> casandraDB

Write: Read: request -> API Getway-> url shortening service -> cache -> casandraDB

Client Request:

Represents the user's action of requesting a short URL or redirectiing to a original URL.

CDN:

Handles the read requests and helps to reduce traffic load on the API getway and service.

API Getway:

Servs as the entry point for api requests, direction to the respective services.

URL Shortening Service:

The main component of the system that process the incoming url shotness create and redirect request.

Cache:

We will use this to cache the frequently accessed do to reduce laod and db and make the response faster.

CassandraDB: The persistant storage solution that holds all the mappings between short URLs and their corresponding long URLs.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Identifier Generation:

How it works: This component generates unique short URLs from long URLs. Common techniques include Base62 encoding of a sequential ID or using a hash function (like SHA-256) with a collision resolution strategy.
Scalability: It scales well with horizontal partitioning. Using a distributed ID generator (like Twitter's Snowflake) can help to avoid collisions and ensure uniqueness across multiple instances.
Algorithm/Data Structure: A simple counter with a Base62 encoding cab be efficient. For collision handling, a hash table can be used to store mappings temporarily.
Importance: Effective Identifier generation is crucial to prevent collisions and ensure that each URL maps correctly to its long counterpart.

Data Storage Layer:

How it Works: This layer stores the mappings between short and long URLs. A relational database or a NoSQL store (like DynamoDB) cab be used, depending on the scale.
Scalability: NoSQL databases offer better horizontal scaling. Partitioning the data based on URL prefixes can help to distribute the load evenly.
Schema Design: A simple table with these columns: Id, short URL, long URL, user ID, expiration, and creation date. Indexing on the short url improve lookups.
Importance: The storage layer must handle high read/write ratios efficiently, especially during traffic surges.

Caching Layer:

How it works: This layer storess the frequently accessed data to reduce the latency and db over load. we can use tools like Redis and etc.

Scalability: Caching scales horizontally by adding more nodes. Implementing a TTL (time-to-live) for cache entries helps manage memory usage.
Importance: Caching is critical for low redirect latency, especially for popular URLs, ensuring that repeated requests are served quickly without hitting the database.

Rate Limiter:

How it works:
A rate limiter helps prevent a single user from overloading the system by limiting the number of requests within a given time window. It also helps stop too many short URL creations for a particular user. we can use many kind of rate limiting alogrith to implement this.

Importance: It helps in detecting a DDoS attack. It helps by blocking a system from spamming the URL shortener system. It stops a particular attacker from creating multiple Short URL creations.
Burst Handling: To handle the traffic burst issue, we need to keep a burst capacity like:
- Ex: burst capacity per second, 20 requests if your limit is a total of 100 requests in 1 minute
- Also, provide a backoff guidance like return http/1.1 409 code with too many requests
- Also, give a waiting period like 5 hour or 24 etc for retry.

Database design:

We will use a single table for the URL shortener because we have a limited table requirement.

CREATE TABLE url_mappings (

short_code text PRIMARY KEY,

long_url text,

created_at timestamp,

expires_at timestamp,

user_id text,

click_count counter

);

System Requirements

Functional:

To shorten the URL given, a long URL should return a short URL.
To redirect a given short URL, the system should redirect to the original URL.
Expiration functionality: The system should support setting a TTL for short URL expiration.
Alias capability (optional): Have the capability to choose a short URL code by the user.

Non-Functional Requirements

1. Low Query Latency:

Achieved by using CDN for edge caching, in-memory cache (Redis) for fast lookups, and efficient key-value database access.

2. Horizontal Scalability:

Achieved by using stateless services, load balancers, distributed NoSQL databases, and caching layers.

3. High Availability:

The system should be highly available so that short URLs are always accessible, even during failures.

Achieved by deploying services across multiple availability zones, using replication in databases, and failover mechanisms.

4. Fault Tolerance:

The system should continue functioning even if some components fail (e.g., server crash, database node failure).

Achieved by redundancy, retries, replication, and fallback mechanisms such as serving from cache when database is unavailable.

5. Efficient Storage of URL Mappings:

The system should store billions of URL mappings efficiently without excessive storage overhead.

Achieved by using compact data models, short encoded keys (Base62), and NoSQL databases optimized for large-scale storage.

Capacity Estimation

Traffic Estimation

Write Requests

Assume a mid-size URL shortener generates:

Write QPS = 200 URL creations/sec

Daily URL creations

200 * 86,400 seconds

= 17,280,000

≈ 17M new URLs/day

Monthly URL creations

17M * 30

≈ 510M URLs/month

Read Requests

Assume read : write ratio = 100 : 1

Read QPS

200 * 100

= 20,000 redirects/sec

Daily redirect requests

20,000 * 86,400

= 1,728,000,000

≈ 1.7B redirects/day

Peak Traffic

Assume 10x traffic spike

Peak Writes

10 * 200

= 2,000 writes/sec

Peak Reads

10 * 20,000

= 200,000 reads/sec

Storage Estimation

Short URL = 7 bytes

Long URL = 100 bytes

Created Timestamp = 10 bytes

Expiration Timestamp = 10 bytes

UserId (optional) = 20 bytes

Click Count = 8 bytes

Total per record

7 + 100 + 10 + 10 + 20 + 8

= 155 bytes

≈ 160 bytes

Daily Storage

17M URLs * 160 bytes

= 2,720,000,000 bytes

≈ 2.7 GB/day

Monthly Storage

2.7 GB * 30

≈ 81 GB/month

Yearly Storage

81 GB * 12

≈ 972 GB

≈ 1 TB/year

Twenty-year Storage

1 TB * 20

= 20 TB

Replication Factor = 3

Actual storage required

3 * 20 TB

= 60 TB

Bandwidth Estimation

Write Request

Request Size = 500 bytes

Response Size = 200 bytes

Total per write request = 700 bytes

Write QPS = 200

Peak Write QPS = 2000

Average write bandwidth

200 * 700

= 140,000 bytes/sec

≈ 137 KB/sec

≈ 0.13 MB/sec

Peak write bandwidth

2000 * 700

= 1,400,000 bytes/sec

≈ 1.34 MB/sec

Read Request (Redirect)

Total per redirect (request + response)

≈ 1 KB

Read QPS = 20,000

Peak Read QPS = 200,000

Average read bandwidth

20,000 * 1 KB

= 20,000 KB/sec

≈ 19.5 MB/sec

Peak read bandwidth

200,000 * 1 KB

= 200,000 KB/sec

≈ 195 MB/sec

Total Bandwidth

Average bandwidth

Write ≈ 0.13 MB/sec

Read ≈ 19.5 MB/sec

--------------------------------

Total ≈ 19.6 MB/sec

Peak bandwidth

Write ≈ 1.34 MB/sec

Read ≈ 195 MB/sec

--------------------------------

Total ≈ 196 MB/sec

Cache Size

The access pattern is highly skewed.

A small fraction of URLs (viral links, popular content) generate most of the redirect traffic.

Using the 80/20 rule:

20% of URLs generate 80% of traffic.

So cache the 20% of the most frequently accessed URLs.

Daily URLs created ≈ 17M

0.2 * 17M

= 3.4M hot URLs per day

Instead of caching the entire historical dataset, we cache recently active URLs.

Assume we cache the last 30 days of hot URLs.

Total cached URLs

3.4M * 30

= 102M URLs

Assume each cache entry stores:

Short URL

Long URL

Metadata

Average entry size ≈ 256 bytes

Total cache memory required

102M * 256 bytes

= 26,112,000,000 bytes

≈ 26 GB

Accounting for Redis overhead, replication, and future traffic growth:

Recommended cache cluster size

≈ 80 GB – 120 GB

API Design

1. Short URL Creation API

POST /api/shorten

Request Body

{

"longUrl": "https://example.com/page",

"expirationTime": "2026-04-12T00:00:00Z"

}

Response

HTTP/1.1 201 Created

Response Body

{

"shortUrl": "https://short.ly/ab12",

"expirationTime": "2026-04-12T00:00:00Z"

}

2. Redirect API

GET /{shortCode}

Example

GET /ab12

Response

HTTP/1.1 302 Found

Location: https://example.com/page

The browser automatically redirects to the URL specified in the Location header.

Explanation:

The redirect API returns HTTP 302 with the Location header containing the original URL.

The browser automatically redirects the user to that URL.

Rate Limiting:

Database Design:

Database Choice: NoSQL Key-Value Store (e.g., DynamoDB / Cosmos DB)

We will use a key-value NoSQL database.

Reason:

The system is highly read-heavy and needs to handle high throughput (20K+ requests per second, and even higher during peak traffic). A NoSQL database is better suited for such workloads.

Managed NoSQL databases like Amazon DynamoDB or Azure Cosmos DB provide these capabilities out of the box, including automatic scaling, built-in replication, and low-latency key-value access.

Hence, a distributed NoSQL database is the most suitable choice for this system.

Why not PostgreSQL:

Schema:

Table -> URLMapping

Primary_Key:

short_url (Partition Key)

Attributes:

long_url (string)

created_at (timestamp)

expiration_at (timestamp)

click_count (number)

user_id (string, optional)

High-level design

it to generate a starter diagram for you to modify...

We will create a single service for both read and write:

Shortening service

We will have this flow to read and write:

Read: request -> cdn -> API Getway-> url shortening service -> cache -> casandraDB

Write: Read: request -> API Getway-> url shortening service -> cache -> casandraDB

Client Request:

Represents the user's action of requesting a short URL or redirectiing to a original URL.

CDN:

Handles the read requests and helps to reduce traffic load on the API getway and service.

API Getway:

Servs as the entry point for api requests, direction to the respective services.

URL Shortening Service:

The main component of the system that process the incoming url shotness create and redirect request.

Cache:

We will use this to cache the frequently accessed do to reduce laod and db and make the response faster.

CassandraDB: The persistant storage solution that holds all the mappings between short URLs and their corresponding long URLs.

Detailed component design

Identifier Generation:

How it works: This component generates unique short URLs from long URLs. Common techniques include Base62 encoding of a sequential ID or using a hash function (like SHA-256) with a collision resolution strategy.
Scalability: It scales well with horizontal partitioning. Using a distributed ID generator (like Twitter's Snowflake) can help to avoid collisions and ensure uniqueness across multiple instances.
Algorithm/Data Structure: A simple counter with a Base62 encoding cab be efficient. For collision handling, a hash table can be used to store mappings temporarily.
Importance: Effective Identifier generation is crucial to prevent collisions and ensure that each URL maps correctly to its long counterpart.

Data Storage Layer:

How it Works: This layer stores the mappings between short and long URLs. A relational database or a NoSQL store (like DynamoDB) cab be used, depending on the scale.
Scalability: NoSQL databases offer better horizontal scaling. Partitioning the data based on URL prefixes can help to distribute the load evenly.
Schema Design: A simple table with these columns: Id, short URL, long URL, user ID, expiration, and creation date. Indexing on the short url improve lookups.
Importance: The storage layer must handle high read/write ratios efficiently, especially during traffic surges.

Caching Layer:

How it works: This layer storess the frequently accessed data to reduce the latency and db over load. we can use tools like Redis and etc.

Scalability: Caching scales horizontally by adding more nodes. Implementing a TTL (time-to-live) for cache entries helps manage memory usage.
Importance: Caching is critical for low redirect latency, especially for popular URLs, ensuring that repeated requests are served quickly without hitting the database.

Rate Limiter:

How it works:
A rate limiter helps prevent a single user from overloading the system by limiting the number of requests within a given time window. It also helps stop too many short URL creations for a particular user. we can use many kind of rate limiting alogrith to implement this.

Importance: It helps in detecting a DDoS attack. It helps by blocking a system from spamming the URL shortener system. It stops a particular attacker from creating multiple Short URL creations.
Burst Handling: To handle the traffic burst issue, we need to keep a burst capacity like:
- Ex: burst capacity per second, 20 requests if your limit is a total of 100 requests in 1 minute
- Also, provide a backoff guidance like return http/1.1 409 code with too many requests
- Also, give a waiting period like 5 hour or 24 etc for retry.

Database design:

We will use a single table for the URL shortener because we have a limited table requirement.

CREATE TABLE url_mappings (

short_code text PRIMARY KEY,

long_url text,

created_at timestamp,

expires_at timestamp,

user_id text,

click_count counter

);