Designing A Simple Url Shortening Service A TinyURL Approach - System Design

Requirements

Functional Requirements:

Create a short URL for a given long URL.
Redirect to long URL associated with a given short URL.
Optional: Custom Aliases, Expiration time

Non-Functional Requirements:

Redirect latency should be low
very high read throughput
highly available
highly durable(no data loss)
Strong consistency for URL mapping.
Eventual consistency for analytics.

Core Desgin Decisions:

ID Generation Strategy: Auto increment, Random, Base62 encoding of chosen uniqueId
Read Optimization: Cache(Redis)
Storage: key-value store
Redirection speed: CDN + cache

Back of the Envelope Estimation:

Assume 10M new URLs per day
Read: write = 100:1
QPS: write: 10M/(24*3600) = 115 writes/sec. Read = 115*100 = 11,500 reads/sec
Storage: Assume avg URL length = 500 bytes. metadata = 100 bytes, total = 600bytes. 10M/day -> 6GB/day for 1year = 2.2TB
20% of those urls in cache = 400GB

API Design

Create Short URL: post/shorten

req = json

{

"longurl' : "......",

"customAlias":"...."(optional),

"expiry":"...."

}

Redirect URL: Get/{shortcode}

Analytics URL: Get/stats/{shortcode}

High-Level Design

URL Shorten Service - generates short code

Redirect Service - Handles lookups and redirects

Analytics Service - async processing

DATABASE choise: key-value(DynamoDB), if we choose SQL, scaling writes is hard, sharding complexity.

Schema: table: URLMapping

shortcode (pk)

longurl

createdAt

expiryAt

userId(optional)

Data flow:

URL Creation flow:

User -> API(Url shorten service)
Generate unique ID
Convert to Base62 ->shortcode
Store in DB after storing in cache
Return the short URL

Redirect flow:

User hits short URL
checks cache
1. Hit -> redirect
2. Miss -> DB Lookup -> Cache -> redirect

Detailed Component Design

Short URL Generation:

Approach1: Random string

. Generate a random 6-8 chars

. problem: collisions -> retry loops

Approach2: Hashing(MD5)

. Hash long URL

. Take first N chars

. Problem: collision risk

Approach3: Counter + Base62

. Generate a unique ID eveytime

. Convert to Base62:

. Why is it best: No Collisions, Deterministic, Compact

. Problem: Distributed ID generation

. use pre allocated ID ranges per server.

Scalability & Reliability:

Bottlenecks

DB overload: Use Read replicas and sharding
Cache misses: use correct eviction policy(LRU)
Hot URL: use CDN
ID generation bottleneck: Use Distributed ID generator
Traffic spikes: use Auto scaling, and Rate limiting
Request failures: use exponential backoffs