Requirements


Functional Requirements:

  • Create a short URL for a given long URL.
  • Redirect to long URL associated with a given short URL.
  • Optional: Custom Aliases, Expiration time, Analytics, Delete/Update



Non-Functional Requirements:


  • Redirect latency should be low < 50ms
  • very high read throughput
  • highly available (redirect must not fail)
  • highly durable(no data loss)
  • Strong consistency for URL mapping.
  • Eventual consistency for analytics.
  • All components should be horizontally scalable


Core Desgin Decisions:

  1. ID Generation Strategy: Auto increment, Random, Base62 encoding of chosen uniqueId
  2. Read Optimization: Cache(Redis)
  3. Storage: key-value store
  4. Redirection speed: CDN + cache
  5. Fault tolerance: ID generator fallback mechanism


Back of the Envelope Estimation:

  1. Assume 10M URLs per day
  2. Read: write = 100:1
  3. QPS: write: 10 * 10^6 /(10^5) = 100 writes/sec. Read = 100*100 = 10,000 reads/sec
  4. Storage: Assume avg URL length = 500 bytes. metadata = 100 bytes, total = 600bytes. 10M/day -> 10 * 10^6 * 600bytes= 6GB/day and for 1year = 6 * 10^6 * 400 = 2.4 * 10^9 = 2.4TB
  5. 20% of those URLs in cache = 480GB for Cache storage
  6. CAP -> choose Availability + Partition tolerance


API Design

Create Short URL: Post/shorten

Redirect URL: Get/{shortcode}

Analytics URL: Get/stats/{shortcode}

Cleanup URL: Post/cleanup

ID generation URL: Get/id


High-Level Design

URL Shorten Service - generates short code

Redirect Service - Handles lookups and redirects

Analytics Service - async processing

ID generator Service - URL Shorten Service uses this to generate the unique id

Cleanup Service - cleansup the old expired urls from DB


DATABASE choice: key-value DB(DynamoDB), if we choose SQL, scaling writes is hard, sharding is complex.

Schema: table: URLMapping

shortcode (pk)

longurl

createdAt

expiryAt

userId(optional)

Redis : key-shortCode value-longURL. TTL can be defined and LRU eviction policy.


Data flow:

URL Creation flow:

  1. User -> API(Url shorten service)
  2. Generate unique ID
  3. Convert to Base62 ->shortcode
  4. Store in DB after storing in cache(write through cache)
  5. Return the short URL


Redirect flow:

  1. User hits short URL
  2. checks cache
    1. Hit -> redirect
    2. Miss -> DB Lookup -> Cache -> redirect


Detailed Component Design

Short URL Generation:

Approach1: Random string

. Generate a random 6-8 chars

. problem: collisions -> retry loops

Approach2: Hashing(MD5)

. Hash long URL

. Take first N chars

. Problem: collision risk

Approach3: Counter + Base62

. Generate a unique ID eveytime [timestamp | machineId | sequence]

. Convert to Base62:

. Why is it best: No Collisions, Deterministic, Compact

. Problem: Distributed ID generation

. use pre allocated ID ranges per server.

. use zookeeper for coordination of id generator services



Scalability & Reliability:

Horizontal scaling:

  1. Cahce: use redis
  2. DB: Partitioning with hash(shortcode)

Bottlenecks and solutions:

  1. DB overload: Use Read replicas and sharding
  2. Hot URL: use CDN caching
  3. ID generation bottleneck: Use Distributed ID generator
  4. Traffic spikes: use Auto scaling, Rate limiting, back pressure
  5. Request failures: use exponential backoffs
  6. Cache memory bloat: TTL(default 24hours) + LRU eviction


Security Measures:

  1. Rate limiting at API gateway
  2. Captcha for suspicious traffic
  3. URL validation (avoids malware)


Backup & Disaster Recovery:

  1. Regular DB snapshots
  2. Multi region replication


Logging & Monitoring:

  1. Request Logs, error logs
  2. Monitor QPS, Latency, Cache hit ratio
  3. Tools: Grafana, datadog