Requirements

Functional Requirements:

Create a short URL for a given long URL.
Return the long URL associated with a given short URL.

Non-Functional Requirements:

High Availability: The system should be operational and accessible at all times, ensuring that users can shorten and retrieve URLs without downtime.
Low Redirect Latency: Users expect quick redirects when they click on a short URL, so minimizing latency is crucial for a good user experience.
Horizontal Scalability: As your service grows, it should be able to handle increased load by adding more servers rather than upgrading existing ones.

API Design

Post request to /short-url Body:{url:"longurl1"}

returns:{shorturl}

So this would first validate the URL if it is correct or not so if correct then would proceed to back end then the bacon would have shot in url functionality then it was saving the database and return the URL.

Get: /short-url will fetch all the url which user has it and will alow params based and paginated

Post: /long-url Body:{url:shorturl} will check the db if this short url exist in our place and if found return the orignal url

High-Level Design

1. Entry Layer (API / Edge Tier)

When user hits the system:

For shortening:

POST /shorten

Responsibilities:

✅ URL validation (is it valid format?)
✅ Authentication (optional)
✅ Rate limiting (prevent abuse, e.g. 5 req/sec)
✅ Forward request to backend service

2. Write Path (URL Shortening Flow)

Step-by-step:

User sends long URL
API validates & rate limits
Request goes to URL Shortener Service

Inside URL Shortener Service:

Step 1: Check cache (optional optimization)

If already exists → return existing short URL

Step 2: Generate short code

Options:

Base62(counter)
Random string
Hash

Example:

https://example.com/abc → "xYz123"

Step 3: Store in DB

shortCode → longURL

Step 4: Update cache

cache[xYz123] = longURL

Step 5: Return response

https://short.ly/xYz123

3. Read Path

When user clicks short URL:

GET /xYz123

Flow:

Step 1: Check cache (FAST path 🚀)

if exists → return immediately

Step 2: If cache miss → query DB

xYz123 → longURL

Step 3: Update cache

Step 4: Redirect user

HTTP 302 → original URL

short_code (PRIMARY KEY)
long_url
created_at
expiry (optional)

Key points:

Indexed on short_code
Optimized for fast reads

Request always hits API
Cache is checked inside system, not bypassing API

Correct understanding:

User → API → Cache → DB (if needed)

Let’s make it solid:

Apply at API layer
Example:
- 5 requests/sec per user
Prevents:
- Spam
- System overload

ComponentResponsibility
API Layer	Validation, auth, rate limiting
Write Service	Generate short URL, store
Read Service	Handle redirects
Cache (Redis)	Fast lookup
DB	Persistent storage

“For identifier generation, I would use a distributed ID generator like Snowflake to ensure globally unique, collision-free IDs without coordination. These IDs are Base62 encoded to generate short URLs.”
“For caching, I would use Redis with a cache-aside pattern and write-through on creation. LRU eviction ensures hot URLs stay cached, and TTL is used for expiring links.”
“Consistency is handled via write-through updates and lazy cache population on misses, giving us eventual consistency with high performance.”

Here is a complete, production-ready system design for a TinyURL-like URL shortener, with all critical details included end-to-end.

1. High-Level Architecture

User → Load Balancer → App Servers
Write Path:
User → API → Identifier Service → Storage → Cache → Response
Read Path:
User → API → Cache → DB → Redirect

2. Core Components Overview

Identifier Generation Service → generates unique short codes
Storage Layer → stores URL mappings
Redirection Service → handles high-volume reads
Cache Layer → reduces latency and DB load
Rate Limiter → protects system from abuse

3. Identifier Generation Service

Responsibility

Generate globally unique, scalable, collision-safe IDs

Design: Distributed Snowflake + Base62

ID Structure (64-bit)

| timestamp (41) | machine_id (10) | sequence (12) |

Flow

Request hits app server
Generate ID locally (no DB dependency)
Convert ID → Base62
Return short code

Algorithms & Complexity

Bit manipulation → O(1)
Base62 encoding → O(log₆₂ N)

Scaling

Each server has unique machine_id
No coordination required
Horizontal scaling is linear

Throughput

Per node: 4096 IDs/ms
100 nodes → millions/sec

Collision & Failure Handling

1. DB-Level Guarantee

UNIQUE(short_code)

2. Retry Logic

Generate → Insert
↓
Conflict → regenerate → retry (max 2–3 times)

3. Clock Drift Handling

If clock goes backward:
- Pause generation OR
- Use last_timestamp + 1

4. Sequence Overflow

If >4096 IDs/ms:
- Wait for next millisecond

Tradeoffs

Pros

No DB bottleneck
Highly scalable
Practically collision-free

Cons

Slight predictability
Requires machine_id coordination

4. Redirection Service (Read Path)

Responsibility

Handle:

GET /abc123 → Redirect to original URL

Flow

Load balancer routes request
Check cache
Cache HIT → return immediately
Cache MISS → query DB → populate cache → return

Performance Goal

Latency < 50ms
90%+ requests served from cache

Data Structures

Cache

Key: short_code
Value: long_url

DB Index

Primary index on short_code

Multi-Layer Scaling

1. CDN (optional)

Edge caching for global users

2. Cache Layer

Use Redis

O(1) lookup
Handles majority traffic

3. DB Layer

Only for cache misses

Tradeoffs

Pros

Extremely fast reads
Scales horizontally

Cons

Cache misses hit DB
Memory cost

5. Storage Layer

Responsibility

Store mapping:

short_code → long_url

Schema

Table: url_mapping
short_code (PK)
long_url
created_at
expiry_time
user_id (optional)

Indexing

Primary: short_code
Secondary: user_id (optional)

Data Structures

SQL → B-tree
NoSQL → hash-based

Scaling

1. Sharding

hash(short_code) % N

Even distribution
Avoid hotspotting

2. Replication

Primary → writes
Replicas → reads

Capacity Estimation

100M URLs
Avg 200B per URL

100M × 200B = 20GB

index overhead → ~40–50GB

Tradeoffs

OptionProsCons
SQL	Strong consistency	Harder to scale
NoSQL	Scalable	Eventual consistency

Failure Handling

Replication failover
Backup + restore
Retry on transient failures

6. Caching Strategy (Detailed)

Cache Type

Use Redis

Cache Pattern

Hybrid Approach

Write-through (on create/update)
Cache-aside (on read)

TTL Strategy

1. Expiring URLs

TTL = expiry_time − current_time

2. Permanent URLs

Default TTL: 24 hours
Sliding TTL:
- Reset TTL on each access

3. Hot URLs

TTL: 1–6 hours
Frequently refreshed by traffic

Eviction Policy

LRU (Least Recently Used)

Preventing Cache Bloat

1. Memory Limit

Example: 10–20 GB cap

2. Admission Control

Cache only after 2–3 hits
Avoid caching one-time URLs

3. Negative Caching

short_code → NULL
TTL = 5 minutes

Cache Invalidation

On Update/Delete

DB update → Cache delete/update

Event-Based Invalidation

Publish event:

URL_UPDATED / URL_DELETED

Consumers remove from cache

TTL as Backup

Ensures stale data eventually removed

Consistency Model

Write-through ensures freshness
Cache-aside ensures performance
Eventual consistency acceptable

7. Rate Limiting

Where Applied

URL creation API → strict limits
Redirect API → relaxed limits (DDoS protection)

Algorithm: Token Bucket

Configuration Example

Capacity: 100 requests
Refill: 10 req/sec

Behavior

Allows bursts up to 100
Then throttles to 10/sec

Implementation

Use Redis

Key

rate_limit:{user_id}

Store

Tokens
Last refill timestamp

Burst Handling

Within limit → allow
Exceeded → reject

Response

429 Too Many Requests
Retry-After: 10

Client Backoff Strategy

Exponential backoff:

1s → 2s → 4s → 8s → capped

Advanced Protection

Per IP limit
Per user limit
Global rate limit

Tradeoffs

ApproachProsCons
Token Bucket	Handles bursts well	Slight complexity
Fixed Window	Simple	Burst spikes
Sliding Window	Accurate	Expensive

8. End-to-End Flow

Write Path

User → API
→ Generate ID
→ Base62 encode
→ Store in DB
→ Update cache
→ Return short URL

Read Path

User → API
→ Cache lookup
→ HIT → Redirect
→ MISS → DB → Cache → Redirect

9. Key Design Characteristics

O(1) ID generation
O(1) cache lookup
Horizontally scalable
Read-optimized architecture
Controlled memory usage via TTL + LRU
Strong uniqueness guarantees
Resilient to failures and high concurrency