Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Users can submit a long URL and receive a shortened URL.
Users visiting the shortened URL should be redirected to the original URL.
The system must store the mapping between short URLs and original URLs.

Non-Functional Requirements:

Low latency: redirects should happen within 100ms.
High availability: the service should be available even if some servers fail.
Scalability: the system should support millions of URLs.
Uniqueness: each short URL must uniquely map to a long URL.

API Design

The URL shortening service exposes RESTful APIs that allow clients to create shortened URLs, retrieve original URLs, and optionally view analytics data. All APIs communicate using HTTP and JSON.

1. Create Short URL

This endpoint generates a shortened URL for a given long URL.

Endpoint

POST /api/v1/urls

Request Body

{
"long_url": "https://example.com/articles/system-design",
"custom_alias": "my-link",
"expiration_date": "2027-01-01"
}

Parameters

FieldDescription
long_url	Original URL that needs to be shortened
custom_alias	Optional user-defined short code
expiration_date	Optional expiry date for the shortened URL

Response

{
"short_url": "https://short.ly/abc123",
"short_code": "abc123",
"created_at": "2026-03-11T12:00:00Z"
}

Explanation

The client sends a long URL to the API server.
The server generates a unique short code using the ID generator.
The mapping between the short code and long URL is stored in the database.
The shortened URL is returned to the client.

2. Redirect to Original URL

This endpoint resolves a short URL and redirects the user to the original destination.

Endpoint

GET /{short_code}

Example request:

GET /abc123

Response Behavior

HTTP 302 Redirect
Location: https://example.com/articles/system-design

Explanation

The API server receives the short code.
It first checks the cache (Redis) for the mapping.
If found, the user is redirected immediately.
If not found, the database is queried and the cache is updated.

3. Get URL Metadata

This endpoint retrieves metadata associated with a shortened URL.

Endpoint

GET /api/v1/urls/{short_code}

Example request:

GET /api/v1/urls/abc123

Response

{
"short_code": "abc123",
"long_url": "https://example.com/articles/system-design",
"created_at": "2026-03-11T12:00:00Z",
"click_count": 1024
}

4. URL Analytics Endpoint (Optional)

This endpoint provides statistics for a shortened URL.

Endpoint

GET /api/v1/urls/{short_code}/stats

Response

{
"short_code": "abc123",
"click_count": 1024,
"last_accessed": "2026-03-11T14:30:00Z"
}

Explanation

The analytics service tracks the number of times a shortened URL
has been accessed. This information can be used for monitoring
and reporting purposes.

5. Delete Short URL (Optional)

This endpoint allows a user to delete a shortened URL.

Endpoint

DELETE /api/v1/urls/{short_code}

Response

{
"message": "URL deleted successfully"
}

Explanation

The API server deletes the mapping from the database
and invalidates the corresponding cache entry.

API Design Considerations

The API servers are stateless, allowing horizontal scaling.
All endpoints are designed to be lightweight to minimize latency.
Rate limiting can be implemented to prevent abuse of the
URL creation endpoint.
Caching is used for redirect operations to reduce database load.

High-Level Design

When many users create shortened URLs simultaneously, the system must ensure that generated short codes remain unique. To avoid collisions, the system uses a centralized ID generation strategy based on an auto-incrementing counter stored in the database or a distributed ID generator. Each new request obtains a unique numeric ID which is then encoded using Base62 to produce the short URL code.

Example flow:

Client → API Server → ID Generator → Base62 Encoder → Database

Because the generated IDs are guaranteed to be unique, the encoded short URLs will also be unique. This approach eliminates the need for repeated database checks that would otherwise be required when using random string generation.

To handle very high concurrency, the ID generator can allocate ID ranges to API servers so that each server can generate IDs independently without constantly querying the database.

Handling Generator Outage and Split-Brain Scenarios

The system must remain reliable even if the ID generator fails or multiple generators accidentally produce overlapping IDs (split-brain). To prevent this, the generator service runs in a replicated configuration with a leader election mechanism.

If the primary generator fails:

Replica generator becomes the leader

To avoid split-brain situations, generators coordinate using a distributed consensus system such as a leader election protocol. Each generator instance is responsible for a specific range of IDs, ensuring that no two instances generate the same IDs.

Additionally, the database enforces a unique constraint on the short_code column to guarantee that duplicate short codes cannot be inserted.

Cache Lookup and Cache Miss Handling

A caching layer is introduced to reduce database load and improve redirect latency. The system uses Redis as an in-memory cache.

Redirect request flow:

Client → Load Balancer → API Server → Cache

If the short URL exists in cache:

Cache Hit → Redirect immediately

If the short URL is not in cache:

Cache Miss → Query Database → Update Cache → Redirect

This approach ensures that frequently accessed URLs are served quickly while still maintaining correctness when data is not present in cache.

Cache TTL and Expiration Strategy

Cache entries include a time-to-live (TTL) value to prevent stale data from remaining in cache indefinitely.

Example:

short_code → long_url (TTL = 24 hours)

When the TTL expires, the next request results in a cache miss and the data is fetched from the database again. This ensures that cache entries remain fresh while keeping memory usage manageable.

To prevent a large number of cache entries expiring at the same time (cache stampede), the TTL values can include small random variations.

Cache Invalidation Strategy

When a URL mapping is updated or deleted, the corresponding cache entry must be invalidated to ensure consistency.

The system uses a cache invalidation mechanism where the API server deletes the relevant key from Redis after updating the database.

Flow:

Update request → Database updated → Cache key deleted

Alternatively, a publish-subscribe mechanism can be used where all API servers subscribe to cache invalidation events and remove outdated entries from their local caches.

Horizontal Scalability

The system is designed to scale horizontally so that it can handle increasing traffic.

Key design principles:

Stateless API Servers

API servers do not store session data locally, allowing new servers to be added easily behind the load balancer.

Client → Load Balancer → Multiple API Servers

New API instances can be added dynamically as traffic increases.

Distributed Database

The URL database can be partitioned across multiple database nodes using sharding.

Example:

Shard 1 → URLs starting with A–M
Shard 2 → URLs starting with N–Z

This allows the system to store billions of URLs while distributing the load across multiple machines.

Read Replicas

To improve read performance for redirect operations, additional read replicas can be introduced.

API Servers → Read Replicas → Primary Database

Write operations go to the primary database, while read operations can be distributed across replicas.

Reliability and Fault Tolerance

To ensure high availability, multiple instances of each component are deployed.

Multiple API Servers
Multiple Cache Nodes
Replicated Database

If one server fails, traffic is automatically routed to healthy instances by the load balancer. Database replication ensures that data remains available even if the primary node fails.

Detailed Component Design

API Servers

API servers handle incoming HTTP requests from clients. They expose REST endpoints for creating short URLs and resolving short URLs to their original destination.

Responsibilities:

Validate incoming URLs
Generate short codes
Interact with cache and database
Handle redirects
Update analytics data

The API servers are designed to be stateless, meaning they do not store any session information locally. This allows multiple API server instances to run behind a load balancer and scale horizontally. When traffic increases, new API servers can be added without affecting existing ones.

Example request flow:

Client → Load Balancer → API Server

Stateless design ensures that requests can be routed to any server instance.

Short URL Generator

The short URL generator is responsible for producing unique short codes for each long URL.

To ensure uniqueness and prevent collisions under high concurrency, the generator uses a Base62 encoding scheme combined with a unique numeric ID.

Character set used:

a-z
A-Z
0-9

Example:

Numeric ID: 125
Base62 encoded: cb

This method produces compact short URLs while supporting billions of possible combinations.

To support high traffic, the generator can allocate ID ranges to API servers so that each server can generate short codes independently without repeatedly querying a central database. This reduces contention during heavy write operations.

If the generator service becomes unavailable, another instance can take over using a leader election mechanism to ensure continuity.

Cache Layer

The system uses Redis as the caching layer to reduce database load and improve response time.

The cache stores frequently accessed URL mappings:

short_code → long_url

Example:

abc123 → https://example.com/article

Redirect flow with caching:

Client → API Server → Cache

If the short code exists in cache:

Cache Hit → Return original URL immediately

If the short code does not exist in cache:

Cache Miss → Query database → Store result in cache → Redirect user

This strategy significantly reduces database queries for frequently accessed URLs.

Database

The database stores the persistent mapping between short URLs and long URLs.

Example schema:

Table: url_mapping
short_code (primary key)
long_url
created_at
expiration_date
click_count

Example entry:

abc123 → https://example.com/article

To ensure consistency and avoid duplicates, the short_code column has a unique constraint.

The database can also maintain analytics information such as the number of times a short URL has been accessed.

Load Balancer

The load balancer distributes incoming traffic across multiple API server instances.

Responsibilities:

Distribute client requests evenly
Prevent server overload
Improve fault tolerance

Example flow:

Client → Load Balancer → API Servers

If one API server becomes unavailable, the load balancer automatically routes traffic to healthy instances.

Analytics and Logging Component

An optional analytics service tracks statistics such as click counts and access patterns.

When a redirect occurs:

API Server → Analytics Service → Store click data

To avoid increasing latency, analytics updates can be processed asynchronously using background workers.

This allows the system to collect useful metrics without affecting the speed of URL redirection.

Fault Tolerance and Replication

Each component in the system is deployed with redundancy to ensure high availability.

Examples:

Multiple API servers
Replicated Redis cache nodes
Primary database with read replicas

If one instance fails, other instances continue handling requests.

Database replication ensures that data remains available even if the primary database becomes unavailable.

Requirements

Functional Requirements:

Users can submit a long URL and receive a shortened URL.
Users visiting the shortened URL should be redirected to the original URL.
The system must store the mapping between short URLs and original URLs.

Non-Functional Requirements:

Low latency: redirects should happen within 100ms.
High availability: the service should be available even if some servers fail.
Scalability: the system should support millions of URLs.
Uniqueness: each short URL must uniquely map to a long URL.

API Design

The URL shortening service exposes RESTful APIs that allow clients to create shortened URLs, retrieve original URLs, and optionally view analytics data. All APIs communicate using HTTP and JSON.

1. Create Short URL

This endpoint generates a shortened URL for a given long URL.

Endpoint

POST /api/v1/urls

Request Body

{
"long_url": "https://example.com/articles/system-design",
"custom_alias": "my-link",
"expiration_date": "2027-01-01"
}

Parameters

FieldDescription
long_url	Original URL that needs to be shortened
custom_alias	Optional user-defined short code
expiration_date	Optional expiry date for the shortened URL

Response

{
"short_url": "https://short.ly/abc123",
"short_code": "abc123",
"created_at": "2026-03-11T12:00:00Z"
}

Explanation

The client sends a long URL to the API server.
The server generates a unique short code using the ID generator.
The mapping between the short code and long URL is stored in the database.
The shortened URL is returned to the client.

2. Redirect to Original URL

This endpoint resolves a short URL and redirects the user to the original destination.

Endpoint

GET /{short_code}

Example request:

GET /abc123

Response Behavior

HTTP 302 Redirect
Location: https://example.com/articles/system-design

Explanation

The API server receives the short code.
It first checks the cache (Redis) for the mapping.
If found, the user is redirected immediately.
If not found, the database is queried and the cache is updated.

3. Get URL Metadata

This endpoint retrieves metadata associated with a shortened URL.

Endpoint

GET /api/v1/urls/{short_code}

Example request:

GET /api/v1/urls/abc123

Response

{
"short_code": "abc123",
"long_url": "https://example.com/articles/system-design",
"created_at": "2026-03-11T12:00:00Z",
"click_count": 1024
}

4. URL Analytics Endpoint (Optional)

This endpoint provides statistics for a shortened URL.

Endpoint

GET /api/v1/urls/{short_code}/stats

Response

{
"short_code": "abc123",
"click_count": 1024,
"last_accessed": "2026-03-11T14:30:00Z"
}

Explanation

The analytics service tracks the number of times a shortened URL
has been accessed. This information can be used for monitoring
and reporting purposes.

5. Delete Short URL (Optional)

This endpoint allows a user to delete a shortened URL.

Endpoint

DELETE /api/v1/urls/{short_code}

Response

{
"message": "URL deleted successfully"
}

Explanation

The API server deletes the mapping from the database
and invalidates the corresponding cache entry.

API Design Considerations

The API servers are stateless, allowing horizontal scaling.
All endpoints are designed to be lightweight to minimize latency.
Rate limiting can be implemented to prevent abuse of the
URL creation endpoint.
Caching is used for redirect operations to reduce database load.

High-Level Design

Example flow:

Client → API Server → ID Generator → Base62 Encoder → Database

To handle very high concurrency, the ID generator can allocate ID ranges to API servers so that each server can generate IDs independently without constantly querying the database.

Handling Generator Outage and Split-Brain Scenarios

If the primary generator fails:

Replica generator becomes the leader

Additionally, the database enforces a unique constraint on the short_code column to guarantee that duplicate short codes cannot be inserted.

Cache Lookup and Cache Miss Handling

A caching layer is introduced to reduce database load and improve redirect latency. The system uses Redis as an in-memory cache.

Redirect request flow:

Client → Load Balancer → API Server → Cache

If the short URL exists in cache:

Cache Hit → Redirect immediately

If the short URL is not in cache:

Cache Miss → Query Database → Update Cache → Redirect

This approach ensures that frequently accessed URLs are served quickly while still maintaining correctness when data is not present in cache.

Cache TTL and Expiration Strategy

Cache entries include a time-to-live (TTL) value to prevent stale data from remaining in cache indefinitely.

Example:

short_code → long_url (TTL = 24 hours)

When the TTL expires, the next request results in a cache miss and the data is fetched from the database again. This ensures that cache entries remain fresh while keeping memory usage manageable.

To prevent a large number of cache entries expiring at the same time (cache stampede), the TTL values can include small random variations.

Cache Invalidation Strategy

When a URL mapping is updated or deleted, the corresponding cache entry must be invalidated to ensure consistency.

The system uses a cache invalidation mechanism where the API server deletes the relevant key from Redis after updating the database.

Flow:

Update request → Database updated → Cache key deleted

Alternatively, a publish-subscribe mechanism can be used where all API servers subscribe to cache invalidation events and remove outdated entries from their local caches.

Horizontal Scalability

The system is designed to scale horizontally so that it can handle increasing traffic.

Key design principles:

Stateless API Servers

API servers do not store session data locally, allowing new servers to be added easily behind the load balancer.

Client → Load Balancer → Multiple API Servers

New API instances can be added dynamically as traffic increases.

Distributed Database

The URL database can be partitioned across multiple database nodes using sharding.

Example:

Shard 1 → URLs starting with A–M
Shard 2 → URLs starting with N–Z

This allows the system to store billions of URLs while distributing the load across multiple machines.

Read Replicas

To improve read performance for redirect operations, additional read replicas can be introduced.

API Servers → Read Replicas → Primary Database

Write operations go to the primary database, while read operations can be distributed across replicas.

Reliability and Fault Tolerance

To ensure high availability, multiple instances of each component are deployed.

Multiple API Servers
Multiple Cache Nodes
Replicated Database

If one server fails, traffic is automatically routed to healthy instances by the load balancer. Database replication ensures that data remains available even if the primary node fails.

Detailed Component Design

API Servers

API servers handle incoming HTTP requests from clients. They expose REST endpoints for creating short URLs and resolving short URLs to their original destination.

Responsibilities:

Validate incoming URLs
Generate short codes
Interact with cache and database
Handle redirects
Update analytics data

Example request flow:

Client → Load Balancer → API Server

Stateless design ensures that requests can be routed to any server instance.

Short URL Generator

The short URL generator is responsible for producing unique short codes for each long URL.

To ensure uniqueness and prevent collisions under high concurrency, the generator uses a Base62 encoding scheme combined with a unique numeric ID.

Character set used:

a-z
A-Z
0-9

Example:

Numeric ID: 125
Base62 encoded: cb

This method produces compact short URLs while supporting billions of possible combinations.

If the generator service becomes unavailable, another instance can take over using a leader election mechanism to ensure continuity.

Cache Layer

The system uses Redis as the caching layer to reduce database load and improve response time.

The cache stores frequently accessed URL mappings:

short_code → long_url

Example:

abc123 → https://example.com/article

Redirect flow with caching:

Client → API Server → Cache

If the short code exists in cache:

Cache Hit → Return original URL immediately

If the short code does not exist in cache:

Cache Miss → Query database → Store result in cache → Redirect user

This strategy significantly reduces database queries for frequently accessed URLs.

Database

The database stores the persistent mapping between short URLs and long URLs.

Example schema:

Table: url_mapping
short_code (primary key)
long_url
created_at
expiration_date
click_count

Example entry:

abc123 → https://example.com/article

To ensure consistency and avoid duplicates, the short_code column has a unique constraint.

The database can also maintain analytics information such as the number of times a short URL has been accessed.

Load Balancer

The load balancer distributes incoming traffic across multiple API server instances.

Responsibilities:

Distribute client requests evenly
Prevent server overload
Improve fault tolerance

Example flow:

Client → Load Balancer → API Servers

If one API server becomes unavailable, the load balancer automatically routes traffic to healthy instances.

Analytics and Logging Component

An optional analytics service tracks statistics such as click counts and access patterns.

When a redirect occurs:

API Server → Analytics Service → Store click data

To avoid increasing latency, analytics updates can be processed asynchronously using background workers.

This allows the system to collect useful metrics without affecting the speed of URL redirection.

Fault Tolerance and Replication

Each component in the system is deployed with redundancy to ensure high availability.

Examples:

Multiple API servers
Replicated Redis cache nodes
Primary database with read replicas

If one instance fails, other instances continue handling requests.

Database replication ensures that data remains available even if the primary database becomes unavailable.