Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...

## 1. Create Short URL API (Write Path)

POST /api/v1/shorten

Purpose:

This API accepts a long URL and generates a unique short URL.

Request:

{

"long_url": "https://example.com/very-long-url"

}

Write Flow:

1. Client sends long URL request

2. Request reaches Load Balancer

3. Load Balancer routes request to App Server

4. App Server validates URL

5. Generate unique short code

6. Store mapping in Primary Database

7. Store mapping in Redis cache

8. Return generated short URL

Response:

{

"short_url": "https://tiny.ly/aX91BcQ"

}

## 2. Redirect API (Read Path)

GET /{short_code}

Example:

GET /aX91BcQ

Purpose:

This API redirects the user from short URL to the original long URL.

Read Flow:

1. User requests short URL

2. Request reaches Load Balancer

3. Request routed to App Server

4. App Server first checks Redis cache

5. If cache miss → lookup Database

6. Populate Redis cache if DB hit

7. Return HTTP 302 Redirect response

Response:

HTTP 302 Redirect

Location: https://example.com/very-long-url

The URL shortening system is designed as a scalable distributed architecture optimized for heavy read traffic and low-latency redirects.

The system mainly consists of:

- Client

- Load Balancer

- Multiple App Servers

- Redis Cache

- Primary Database

- Read Replica Databases

The Load Balancer distributes requests across multiple app servers to support horizontal scaling and high availability.

App Servers handle:

- URL validation

- short code generation

- redirect handling

- cache interactions

Redis is used for ultra-fast lookups because redirect requests are significantly higher than URL creation requests.

The Primary Database stores permanent URL mappings, while replica databases help scale read traffic and improve availability.

flowchart TD

A["Client / User"]

B["Load Balancer"]

C1["App Server 1"]

C2["App Server 2"]

C3["App Server 3"]

D["Redis Cache"]

E["Primary Database"]

F1["Read Replica 1"]

F2["Read Replica 2"]

A --> B

B --> C1

B --> C2

B --> C3

C1 --> D

C2 --> D

C3 --> D

D --> E

E --> F1

E --> F2

## 1. Short Code Generation Service

The short code generation service is responsible for generating unique short URLs for every long URL submitted by users.

Initially, the system can use:

Database Auto Increment ID + Base62 Encoding

Flow:

1. Insert long URL into DB

2. DB generates unique numeric ID

3. Convert ID to Base62 string

4. Use generated Base62 value as short code

Example:

125789 → gTb

Base62 characters:

[a-zA-Z0-9]

This provides compact and URL-friendly identifiers.

Scaling:

At very large scale, sequential IDs become predictable and expose system growth patterns.

The system can later evolve toward:

- Randomized Base62 tokens

- NanoID

- Snowflake-based distributed IDs

to support distributed generation and improve security.

Tradeoffs:

Auto Increment + Base62:

- Simple

- Collision-free

- Predictable IDs

Random Tokens:

- Hard to guess

- Requires collision checks

Snowflake IDs:

- Highly scalable

- More complex implementation

## 2. Redis Cache Layer

The redirect operation is the most frequently used operation in the system.

Since:

Reads >>> Writes

the system uses Redis cache to reduce database load and improve redirect latency.

Cache Lookup Flow:

1. User requests short URL

2. App Server checks Redis

3. If found → immediate redirect

4. If cache miss → query Database

5. Populate Redis cache

6. Return redirect response

This pattern is called:

Cache Aside Pattern

Why Redis?

Redis provides:

- in-memory lookups

- millisecond latency

- high throughput

- distributed caching support

Scaling:

Redis can scale using:

- Redis replication

- Redis clustering

- partitioned cache nodes

Tradeoffs:

Benefits:

- Very fast reads

- Reduced DB load

- Improved latency

Drawbacks:

- Additional infrastructure

- Cache invalidation complexity

- Memory cost

## 3. Database Design and Scaling

The database stores the permanent mapping:

short_code → original_url

Example schema fields:

- id

- short_code

- original_url

- created_at

Indexing:

Indexing is applied on:

short_code

because redirect lookups happen continuously.

This significantly improves lookup speed.

Replication:

The system uses:

Primary DB + Read Replicas

Writes:

- Go to Primary DB

Reads:

- Served from Replica DBs

Benefits:

- Better scalability

- High availability

- Reduced read load on primary DB

Sharding / Partitioning:

As data grows massively, a single database can become bottleneck.

The system can scale using:

- Horizontal partitioning

- Sharding

Example:

hash(short_code) % N

This distributes records across multiple database shards.

Tradeoffs:

Replication:

- Improves reads and availability

- Replication lag possible

Sharding:

- Massive scalability

- Operational complexity

Indexing:

- Faster lookups

- Extra storage and write overhead

## Additional Tradeoffs Discussion

Multiple App Servers:

Advantages:

- Better scalability

- Fault tolerance

- High availability

Disadvantages:

- Higher infrastructure cost

- Deployment complexity

- Monitoring overhead

Redis Caching:

Advantages:

- Faster redirects

- Reduced DB load

Disadvantages:

- Cache consistency challenges

- Additional memory cost

Database Sharding:

Advantages:

- Supports huge scale

Disadvantages:

- Complex operations

- Harder debugging

## Conclusion

The proposed URL shortening system is designed as a highly scalable distributed read-heavy architecture optimized for:

- Low latency redirects

- High availability

- Horizontal scalability

- Fast lookups

Core scalability techniques include:

- Load Balancing

- Redis caching

- Database replication

- Indexing

- Sharding

The system can initially start simple using:

Primary DB + Redis + App Servers

and later evolve incrementally toward distributed large-scale architecture as traffic grows.