Requirements


Functional Requirements:

  • Create a short URL for a given long URL.
  • Redirect to long URL associated with a given short URL.
  • Optional: Custom Aliases, Expiration time



Non-Functional Requirements:


  • Redirect latency should be low < 50ms
  • very high read throughput
  • highly available (redirect must not fail)
  • highly durable(no data loss)
  • Strong consistency for URL mapping.
  • Eventual consistency for analytics.


Core Desgin Decisions:

  1. ID Generation Strategy: Auto increment, Random, Base62 encoding of chosen uniqueId
  2. Read Optimization: Cache(Redis)
  3. Storage: key-value store
  4. Redirection speed: CDN + cache


Back of the Envelope Estimation:

  1. Assume 10M new URLs per day
  2. Read: write = 100:1
  3. QPS: write: 10M/(24*3600) = 115 writes/sec. Read = 115*100 = 11,500 reads/sec
  4. Storage: Assume avg URL length = 500 bytes. metadata = 100 bytes, total = 600bytes. 10M/day -> 6GB/day for 1year = 2.2TB
  5. 20% of those URLs in cache = 400GB for Cache storage
  6. CAP -> choose Availability + Partition tolerance


API Design

Create Short URL: Post/shorten

Redirect URL: Get/{shortcode}

Analytics URL: Get/stats/{shortcode}

Cleanup URL: Post/cleanup


High-Level Design

URL Shorten Service - generates short code

Redirect Service - Handles lookups and redirects

Analytics Service - async processing

ID generator Service - URL Shorten Service uses this to generate the unique id

Cleanup Service - cleansup the old expired urls from DB


DATABASE choice: key-value DB(DynamoDB), if we choose SQL, scaling writes is hard, sharding is complex.

Schema: table: URLMapping

shortcode (pk)

longurl

createdAt

expiryAt

userId(optional)

Redis : key-shortCode value-longURL. TTL can be defined and LRU eviction policy.


Data flow:

URL Creation flow:

  1. User -> API(Url shorten service)
  2. Generate unique ID
  3. Convert to Base62 ->shortcode
  4. Store in DB after storing in cache(write through cache)
  5. Return the short URL


Redirect flow:

  1. User hits short URL
  2. checks cache
    1. Hit -> redirect
    2. Miss -> DB Lookup -> Cache -> redirect


Detailed Component Design

Short URL Generation:

Approach1: Random string

. Generate a random 6-8 chars

. problem: collisions -> retry loops

Approach2: Hashing(MD5)

. Hash long URL

. Take first N chars

. Problem: collision risk

Approach3: Counter + Base62

. Generate a unique ID eveytime

. Convert to Base62:

. Why is it best: No Collisions, Deterministic, Compact

. Problem: Distributed ID generation

. use pre allocated ID ranges per server.



Scalability & Reliability:

Bottlenecks and solutions:

  1. DB overload: Use Read replicas and sharding
  2. Cache misses: use correct eviction policy(LRU)
  3. Hot URL: use CDN caching
  4. ID generation bottleneck: Use Distributed ID generator
  5. Traffic spikes: use Auto scaling, and Rate limiting
  6. Request failures: use exponential backoffs


Security Measures:

  1. Rate limiting at API gateway
  2. Captcha for suspicious traffic
  3. URL validation (avoids malware)


Backup & Disaster Recovery:

  1. Regular DB snapshots
  2. Multi region replication


Logging & Monitoring:

  1. Request Logs, error logs
  2. Monitor QPS, Latency, Cache hit ratio
  3. Tools: Grafana, datadog