My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach

by whisper3949

Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • Low Redirect latency
  • High Availability
  • Horizontal Scaling
  • Durability
  • Ready Heavy optimization
  • Minimal URL Length
  • Non Guessable


API Design

Write PATH - Create Short URL


POST /api/v1/urls


Headers:

Authorization: Bearer

Content_type: application/json


Request Body:

{

"long_url": "https://example.com/very/long/path/to/resouce",

"custom_alias": "my-link",

"expiry_date": "2027-01-01"

}


response (201 created)

{

"short_url" : "https://short.ly/Ab3xK9z",

"short_code": "Ab3xK9z",

"long_url": "https://example.com/very/long/path/to/resouce",

"created_at": "2026-04-15T10:30:00Z", "expires_at": "2027-01-01T00:00:00Z"

}


Errors:

400 Bad Request - Invalid URL format

409 Conflict - Custom Alias already exists

429 Too Many request - Rate limit Exceed


Read path - Redirect Short URL


GET /:shortCode


Example: GET https://short.ly/Ab3xK9z

Response (301 Moved Permanently):


HTTP/1.1 301 Moved Permanently

Location: https://example.com/very/long/path/to/resouce


Why 301 and not 302?


301 = Permanent Redirect

302 = Temporary Redirect


Errors:

404 Not Found

410 Gone- Url Expired


High-Level Design

The system is split into two main paths that both enter through an API Gateway:


WRITE PATH (Create Short URL):

Client sends POST request with long URL → API Gateway handles rate limiting, authentication, and request validation → Routes to Shortener Service → Shortener Service calls ID Generation Service which uses Base62 encoding of a distributed counter with pre-allocated ranges to generate a unique 7-character short code → Shortener Service stores the short_code → long_url mapping in the NoSQL Database (partitioned by short_code for even distribution) → Pre-warms Redis Cache with the new mapping → Returns the short URL to the client.


READ PATH (Redirect Short URL):

User clicks short URL → API Gateway routes to Redirect Service → Redirect Service checks Redis Cache first (1ms latency). On cache HIT, immediately returns 301 Permanent Redirect to the original long URL. On cache MISS, queries the NoSQL Database (5-10ms), stores the result in Redis Cache for future requests, then returns 301 redirect.


KEY COMPONENTS:

- API Gateway: Single entry point for both paths. Handles rate limiting (Token Bucket), authentication via API keys, and request routing.

- Shortener Service: Handles URL creation logic. Validates input, coordinates with ID Generation Service, writes to database.

- Redirect Service: Handles URL resolution. Optimized for speed with cache-first approach.

- ID Generation Service (Base62): Generates unique short codes using counter-based Base62 encoding with range allocation across multiple servers. Zero collision risk.

- Redis Cache: Cache-Aside pattern. Caches top 20% of URLs (2GB). 24-hour TTL. Handles 80% of read traffic.

- NoSQL Database: DynamoDB or Cassandra. Partitioned by short_code. Stores all URL mappings with metadata.

- Analytics Service: Asynchronously tracks click counts via message queue. Does not affect redirect latency.


https://link.excalidraw.com/readonly/hTYiWAyA3tITszIxyM9B

Detailed Component Design

Component 1: ID Generation Service (Base 62)

A. How does it works?

Base62 counter with range allocation .

62 characters (a-z,A-Z,0-9] , 7 chars= 3.5 trillions unique URLs

Why Base 62 not MD5 Hash

Base62 have zero collision.


B. How does it scale ?

Range allocations: Server 1 get 1-1M, server 2 gets 1M-2M . each server works indenpendently . No coordinations . Zookeeper/etcd assigns new ranges when a server range runs out.



C. How Big?

Write: 100M/2.5M = 40QPS

Read: 400QPS

Storage: 100M x 200B x 60month = 1.2TB

Cache: 34.5M/day X 0.20 x 200B = 2GB redis


Each server handle 500sec . one server is enough. Add more for redudancy not capacity.


D. What if ?

  1. Counter service fails ? -> servers use pre-allocated range, keep working
  2. Two servers get same range -> zookeeper guarantees unique range.
  3. Custom alias requested -> check db first return 409 if taken


Fallback mechanism for ID Generation outages:


Primary: Distributed counter with Zookeeper range allocation


Fallback 1 (short outage < 5 min):

Each server pre-allocates a range of 1 million IDs locally.

If Zookeeper is down, server continues generating from its

local range. No disruption to URL creation.


Fallback 2 (extended outage > 5 min):

Switch to UUID-based generation (timestamp + server_id + random).

Format: milliseconds(41 bits) + shard_id(5 bits) + sequence(12 bits)

Similar to Twitter Snowflake. Guarantees uniqueness without

any central coordination.


Fallback 3 (split-brain — two Zookeeper leaders):

Range gaps are acceptable. If Server A thinks its range is

1M-2M and Server B also gets 1M-2M due to split-brain,

both generate IDs but with a server_id prefix to avoid collision.

After split-brain resolves, reconcile and reassign clean ranges.


Recovery: When Zookeeper recovers, servers request fresh ranges

and resume normal counter-based generation. No data migration needed.


Component 2 : Caching


How does it works?


Cache - Aside pattern with redis

Read : Check Redis-> HIT = returns (1ms) -> MISS = query DB (5ms) -> stor e in redis


Write : Store in DB -> pre-warm cache

TTL : 24 hours . LRU eviction when cache is full.


Eviction Policy: LRU (Least Recently Used)

- When Redis hits 2GB memory limit, LRU automatically removes

the URL that hasn't been accessed for the longest time.

- Criteria: Last-access timestamp. URLs clicked today stay,

URLs not clicked for weeks get evicted first.

- Impact on performance: LRU keeps cache hit rate at ~80%

because popular URLs are always recently accessed.

- Combined with 24hr TTL: even frequently accessed URLs get

a fresh DB read once per day, preventing stale data.

- Redis config: maxmemory-policy = allkeys-lru


Why Cache Aside not write-through?


Because we dont want to cache every new url. only the one people actually click. 80/20 rule. 20$ of url get 80% of clicks.


How does it scale?

one redis instance handle 2GB easily, if traffic grow we can do horizontal scaling.


How big?

Daily Read: 400QPS x 86400 = 34.5M/day

Cache 20% = 34.5M x 0.20 x 200 bytes = 1.4 GB -> 2GB redis


What if


  1. Redis crashes -> All read hits DB latency will increase from 1ms to 5ms but systems works but slow.
  2. Cache stampede -> Distributed lock (SETNX) . only one request fetches from DB.
  3. URL deleted -> delete redis key immediately. next read get fresh data from DB.
  4. Cache full -> LRU eviction removed least recent used url automatically.



Component 3 : Database

How does it work?

NoSql Database (DynamoDB) . PartitionKey = short_code

Why no sql, simple key-value lookup. No joins horizontal scale build in


Why not choosing SQL because of complex Joins, strong ACID .


How does it scale?


partition by short_code -> event distributions (Base62)

no hot partitions because short codes are random-ish

read replicas for read scaling if needed.


How Big


100M x 200Byte x 60 months= 1.2TB (5 years)

with 3x replication = 3.6TB


What if


  1. Hot partition -> cache absorbs 80% reads. DB barely touched.
  2. URL expires -> Background cleanup job
  3. Need analytics -> can run a analytics service which connect with the db to get the analytics report.


Handling cache miss impact on database:


Normal state: 80% cache hit rate → only 20% of reads hit DB

400 QPS reads × 0.20 = 80 queries/sec to DB (easily handled)


Worst case (Redis down): 100% cache miss → all 400 QPS hit DB

DB can handle 5,000-10,000 QPS, so 400 QPS is still fine.

Latency increases from 1ms to 5-10ms but no outage.


Cache warming after restart:

Cold cache gradually warms through natural traffic.

Within 1 hour, hit rate recovers to ~60%.

Within 24 hours, back to normal ~80%.


Protection against DB overload during cache miss spikes:

- Connection pooling: limit max DB connections to 100

- Circuit breaker: if DB latency > 100ms, return cached

stale data (better stale than slow)

- Request coalescing: if 100 users request same uncached URL,

only 1 query goes to DB, others wait for result


Supports markdown