Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • System should be highly available
  • System must be scale horizontally to handle increasing traffic
  • Redirection must happen with very low latency
  • Generated Short Url should not be predictable, using sequenceId is not sufficient enough


Resource Estimation:

The ratio of shortening requests(writes) to redirection requests(reads) is 1:100

The system handles 200M new url shortening request per month

A url shortening entry requires 500 bytes of storage

Each entry expires after five years unless explicitly deleted

The service has 100M daily active users



Storage Estimation:

200 M/month * 12 month/year * 5 year * 500 bytes = 6TB


Query Rate Estimation

With 1:100 write to read ration, 200 Million new urls per month will result in 200 Billion redirection requests


To calculate the queries per second

we first find the number of seconds in average month


30.42 * 24hr * 60 minutes * 60 seconds= 2628288


write QPS = 200 M/2628288 = 76 Urls/Sec

Read QPS = 100 * 76 Urls/second


API Design

  1. CreateShortUrl()
  2. redirectUrl(url)
  3. expireUrl(url)


High-Level Design




Detailed Component Design

Database: A read heavy and requires horizontally scalable storage, We need to store:

User Details

Mapping of long URLs to their corresponding short URLs,

As a the URL mappings are independent and don't have complex relationships, a NoSql database is a good fit. MongoDB is a strong candidate because it follows leader follower protocol, enabling the use of

replicas for heavy workloads

MongoDB ensures atomicity in concurrent write operations and returns duplicate key errors to prevent collisions


Short URL generator - It consists of two main parts

1) A sequence to generate unique IDs

2) A Base 58 Encoder to enhance the readability of the short url



Other building blocks:

Load Balancing: The system can use Global Server Load Balancing along side local load balancers to improve availability. Because there is typically a delay between URL creation and the first access request, eventual consistency across geograhically distributed database is acceptable.


Cache : Memcached is a good choice because it is simple, horizontally scalable, and meets our needs. To minimize latency, we will use a data centre specific caching layer rather than a global one.


Rate Limiter: To prevent abuse, we will rate limit user based on their API KEY, the fixed window counter algorithm is sufficient for this purpose, Allowing a fixed number of operation per user within specific time frame.