Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Generate a short url from long url.
Redirect Short Url to Original Url.
Support Custom Aliases.
Support Link Expiration.

Non-Functional Requirements:

High Availability (redirect must always work)
Low Latency (<100 ms for redirect)
High Scalability (millions of users)
Read-Heavy System (redirects >> writes)
Horizontal scalability to handle millions of requests by adding more servers instead of scaling vertically
Fault tolerance (system should work even if cache or a node fails)

Assumptions

1M Urls created per day.
100M redirects per day.
URL Size ~500 bytes

API Design

Create Short URL

URL:- POST /shorten

Request:- {

"long_url": "https://example.com/abc",

"custom_alias": "myurl", // optional

"expiry": "2026-12-31" // optional

}

Response:-

{

"short_url" : "https://tinyurl.com/abc123"

}

Redirect API

URL: -GET /{short_code}

Response:- HTTP 301 Redirect -> Original URL

Analytics API

URL:- GET /analytics/{short_code}

High-Level Design

Architecture Flow

Client -> Load Balancer -> API Servers -> Cache -> Database

Components

CDN / Edge Layer

Cache 301 redirect responses for popular URLs
Reduce latency by serving requests closer to users
Offload traffic from backend systems

Load Balancer

Distributes Traffic across API Servers (Node.js)
Performs health checks
Ensures high availability

API Gateway

Routes requests to appropriate services
Handles authentication (if needed)
Applies throttling and request validation

Rate Limiter

Prevents abuse and traffic spikes
Protects backend services from overload

Identifier Generation & Uniqueness

Generate short_code that is:

Unique
Non-colliding
Scalable across multiple servers

Use Distributed ID Generation Service

How it works:

ID = timestamp + machine_id + sequence

Example:

[ 41 bits timestamp ][10 bits machine_id][12 bits sequence]

Why this works:

Timestamp → unique over time
Machine ID → unique across servers
Sequence → handles same-millisecond requests

No collision even at high concurrency

Convert to Short Code

short_code = Base62(ID)

Prevent Guessable IDs

Problem:

Sequential IDs → easy to guess

Solution:

short_code = Base62(ID XOR random_salt)

Collision Handling (Edge Case)

Apply UNIQUE constraint on short_code

If collision → regenerate

To ensure uniqueness of identifiers, I will use a distributed ID generation strategy instead of relying solely on database auto-increment.

The ID will be generated using a combination of timestamp, machine ID, and sequence number (similar to Snowflake), ensuring uniqueness across multiple servers even under high concurrency.

The generated ID will then be encoded using Base62 to create a short URL.

To prevent predictability and URL enumeration, I will introduce randomness by applying a transformation such as XOR with a random salt before encoding.

Additionally, a unique constraint will be enforced at the database level as a safeguard against rare collisions.

Cache (Redis)

Stores Frequently Accessed URLs
Reduces DB Load

Database (MYSQL/NOSQL)

The storage layer is responsible for maintaining persistent URL mappings, ensuring durability, and supporting efficient read/write operations at scale.

Request Flow

Redirect Flow

User Clicks on Short URL.
Check Redis
If miss -> query DB
Store in Redis
Redirect User

System is read-heavy. Cache is critical.

Detailed Component Design

URL Generation

Approach - Auto Increment ID -> Base62 encoding

Example - ID = 125 -> "cb"

Problem: Predictable IDs

Sequencial IDs can be guessed.

Solution:

short_code = Base62(ID XOR random_salt)

Prevents enumeration attacks

Scalable Option:

Use Distributed ID Generator

timestamp + machine_id + sequence

Avoids bottleneck of single DB

CONCURRENT CREATES & COLLISION

Auto Increment ensures uniqueness

DB Gurantees atomic inserts

If using random codes

Problem: Collision Possible

Solution: UNIQUE Constraint + retry

REDIRECT Service

Flow:

Check Redis

If hit -> return

If miss -> Db lookup

Cache result

Return 301 Redirect

Why 301?

Permanent Redirect
Browser + CDN Caching
Better Performance

Cache Design (Redis)

Strategy - Cache Aside

Key Design

Key = url:{short_code}

value = long_url

Why Cache

Avoid DB Load
Reduce Latency

Failure Handling

If Redis Fails

Fallback -> DB

Production Setup

Redis Cluster (Sharding)
Replication + Failover
Eviction Policy (LRU)

DataBase Design (MYSQL)

Schema

CREATE TABLE urls (

id BIGINT PRIMARY KEY,

short_code VARCHAR(10) UNIQUE,

long_url TEXT,

created_at TIMESTAMP,

expiry TIMESTAMP

);

Optimization

Index on short_code
Read Replicas for Scaling

SCALABILITY

API Layer

Sateless Node.js Servers behind load balancers

Horizontal Scalling By adding more instances.
No session storage in server (use redis if needed)

Database Scaling

Sharding

Partition data using:

shard = hash(short_code) % N;

Each shard handles subset of data.

Read Replicas

Writes -> Primary DB

Reads -> Replica DB

Reduces read pressure.

Cache Scaling

Redis Cluster

keys distributed across multiple nodes.

Increases memory + throughput

Global Scaling

Geo-distributed deployment (multi-region)

Reduces latency for global users.

The system scales horizontally at each layer: stateless API servers, Redis Cluster for distributed caching, and database sharding with read replicas for handling large-scale traffic.

Fault Tolerance

Redis Failure

Fallback to DB
Increased Latency

Generator Failure

Multiple generator instances
Retry Mechanism

Split Brain Handling

use machine_id in ID Generation

Traffic Spikes

System should:

Rate limit users (Rate Limiting - Redis Based)

use token bucket/ sliding window

Example -

INCR user_ip

EXPIRE 1 Sec

if:

Count > 100 -> reject request

Prioritize redirects over writes

Redirects = High Priority

Shorten API = Low Priority

Under load:

Reject POST /shorten

Allow GET redirect

Load Shedding

If system overloaded:

Return:

503 Service Unavailable

Retry-After: 5 Seconds

Autoscale Servers

Scale API Servers

Scale Redis Cluster

Scale DB Replicas

BURST HANDLING + BACKOFF

Problem:

Clients retry aggressively:

1000 failed requests -> retry instantly -> system collapse

Solution: Exponential Backoff + Jitter

Example:

Retry 1 -> Wait 1 Sec

Retry 2 -> Wait 2 Sec

Retry 3 -> Wait 4 Sec

Add randomness:

Wait = base + 2^n + random(0-100ms)

Why?

Prevents thundering herd problem

Server Hint

Return: 503 + Retry+After header

Analytics (ASYNC)

Do Not Update DB Per request

Solution

Use Redis counters OR
Use queue (RabbitMQ)

Flow

Redirect -> publish event -> process async -> batch update DB

Expiry & Data Management

Expiry Strategy

Soft Delete (mark expired)
Do not reuse short codes

Scaling Data

Move expired URLs to archive table
Keep active table small

Requirements

Functional Requirements:

Generate a short url from long url.
Redirect Short Url to Original Url.
Support Custom Aliases.
Support Link Expiration.

Non-Functional Requirements:

High Availability (redirect must always work)
Low Latency (<100 ms for redirect)
High Scalability (millions of users)
Read-Heavy System (redirects >> writes)
Horizontal scalability to handle millions of requests by adding more servers instead of scaling vertically
Fault tolerance (system should work even if cache or a node fails)

Assumptions

1M Urls created per day.
100M redirects per day.
URL Size ~500 bytes

API Design

Create Short URL

URL:- POST /shorten

Request:- {

"long_url": "https://example.com/abc",

"custom_alias": "myurl", // optional

"expiry": "2026-12-31" // optional

}

Response:-

{

"short_url" : "https://tinyurl.com/abc123"

}

Redirect API

URL: -GET /{short_code}

Response:- HTTP 301 Redirect -> Original URL

Analytics API

URL:- GET /analytics/{short_code}

High-Level Design

Architecture Flow

Client -> Load Balancer -> API Servers -> Cache -> Database

Components

CDN / Edge Layer

Cache 301 redirect responses for popular URLs
Reduce latency by serving requests closer to users
Offload traffic from backend systems

Load Balancer

Distributes Traffic across API Servers (Node.js)
Performs health checks
Ensures high availability

API Gateway

Routes requests to appropriate services
Handles authentication (if needed)
Applies throttling and request validation

Rate Limiter

Prevents abuse and traffic spikes
Protects backend services from overload

Identifier Generation & Uniqueness

Generate short_code that is:

Unique
Non-colliding
Scalable across multiple servers

Use Distributed ID Generation Service

How it works:

ID = timestamp + machine_id + sequence

Example:

[ 41 bits timestamp ][10 bits machine_id][12 bits sequence]

Why this works:

Timestamp → unique over time
Machine ID → unique across servers
Sequence → handles same-millisecond requests

No collision even at high concurrency

Convert to Short Code

short_code = Base62(ID)

Prevent Guessable IDs

Problem:

Sequential IDs → easy to guess

Solution:

short_code = Base62(ID XOR random_salt)

Collision Handling (Edge Case)

Apply UNIQUE constraint on short_code

If collision → regenerate

To ensure uniqueness of identifiers, I will use a distributed ID generation strategy instead of relying solely on database auto-increment.

The ID will be generated using a combination of timestamp, machine ID, and sequence number (similar to Snowflake), ensuring uniqueness across multiple servers even under high concurrency.

The generated ID will then be encoded using Base62 to create a short URL.

To prevent predictability and URL enumeration, I will introduce randomness by applying a transformation such as XOR with a random salt before encoding.

Additionally, a unique constraint will be enforced at the database level as a safeguard against rare collisions.

Cache (Redis)

Stores Frequently Accessed URLs
Reduces DB Load

Database (MYSQL/NOSQL)

The storage layer is responsible for maintaining persistent URL mappings, ensuring durability, and supporting efficient read/write operations at scale.

Request Flow

Redirect Flow

User Clicks on Short URL.
Check Redis
If miss -> query DB
Store in Redis
Redirect User

System is read-heavy. Cache is critical.

Detailed Component Design

URL Generation

Approach - Auto Increment ID -> Base62 encoding

Example - ID = 125 -> "cb"

Problem: Predictable IDs

Sequencial IDs can be guessed.

Solution:

short_code = Base62(ID XOR random_salt)

Prevents enumeration attacks

Scalable Option:

Use Distributed ID Generator

timestamp + machine_id + sequence

Avoids bottleneck of single DB

CONCURRENT CREATES & COLLISION

Auto Increment ensures uniqueness

DB Gurantees atomic inserts

If using random codes

Problem: Collision Possible

Solution: UNIQUE Constraint + retry

REDIRECT Service

Flow:

Check Redis

If hit -> return

If miss -> Db lookup

Cache result

Return 301 Redirect

Why 301?

Permanent Redirect
Browser + CDN Caching
Better Performance

Cache Design (Redis)

Strategy - Cache Aside

Key Design

Key = url:{short_code}

value = long_url

Why Cache

Avoid DB Load
Reduce Latency

Failure Handling

If Redis Fails

Fallback -> DB

Production Setup

Redis Cluster (Sharding)
Replication + Failover
Eviction Policy (LRU)

DataBase Design (MYSQL)

Schema

CREATE TABLE urls (

id BIGINT PRIMARY KEY,

short_code VARCHAR(10) UNIQUE,

long_url TEXT,

created_at TIMESTAMP,

expiry TIMESTAMP

);

Optimization

Index on short_code
Read Replicas for Scaling

SCALABILITY

API Layer

Sateless Node.js Servers behind load balancers

Horizontal Scalling By adding more instances.
No session storage in server (use redis if needed)

Database Scaling

Sharding

Partition data using:

shard = hash(short_code) % N;

Each shard handles subset of data.

Read Replicas

Writes -> Primary DB

Reads -> Replica DB

Reduces read pressure.

Cache Scaling

Redis Cluster

keys distributed across multiple nodes.

Increases memory + throughput

Global Scaling

Geo-distributed deployment (multi-region)

Reduces latency for global users.

The system scales horizontally at each layer: stateless API servers, Redis Cluster for distributed caching, and database sharding with read replicas for handling large-scale traffic.

Fault Tolerance

Redis Failure

Fallback to DB
Increased Latency

Generator Failure

Multiple generator instances
Retry Mechanism

Split Brain Handling

use machine_id in ID Generation

Traffic Spikes

System should:

Rate limit users (Rate Limiting - Redis Based)

use token bucket/ sliding window

Example -

INCR user_ip

EXPIRE 1 Sec

if:

Count > 100 -> reject request

Prioritize redirects over writes

Redirects = High Priority

Shorten API = Low Priority

Under load:

Reject POST /shorten

Allow GET redirect

Load Shedding

If system overloaded:

Return:

503 Service Unavailable

Retry-After: 5 Seconds

Autoscale Servers

Scale API Servers

Scale Redis Cluster

Scale DB Replicas

BURST HANDLING + BACKOFF

Problem:

Clients retry aggressively:

1000 failed requests -> retry instantly -> system collapse

Solution: Exponential Backoff + Jitter

Example:

Retry 1 -> Wait 1 Sec

Retry 2 -> Wait 2 Sec

Retry 3 -> Wait 4 Sec

Add randomness:

Wait = base + 2^n + random(0-100ms)

Why?

Prevents thundering herd problem

Server Hint

Return: 503 + Retry+After header

Analytics (ASYNC)

Do Not Update DB Per request

Solution

Use Redis counters OR
Use queue (RabbitMQ)

Flow

Redirect -> publish event -> process async -> batch update DB

Expiry & Data Management

Expiry Strategy

Soft Delete (mark expired)
Do not reuse short codes

Scaling Data

Move expired URLs to archive table
Keep active table small