Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- First non functional requirement is to design 2 endpoints to GET the short URL and other one is to fetch the converted long to short URL.
- Other Non functional requirement is to take care of scaling, if tomorrow we have huge number of shortened URL we need to scale our db to reduce CPU spikes , I/0 operations. Ensure Proper Indexing for faster searches and also enable replication and sharding for the data.
- Cache the frequently used short URL to make sure we reduce network calls to DB and reduce the latency.
- To make sure data is available we need to make sure we can have a simple replication enabled following a simple master slave pattern with one leader (this will handle writes) and slaves(reads).
- To reduce the latency further at code level dont have any N+1 queries because if the dataset size increases then it will increase the tail latency.
API Design
We will need 3 API endpoints
The first endpoint will be the GET endpoint for fetching the URL shortener.
The second endpoint will be the POST endpoint to actually have the logic for the shortening the URL.
We can name the 2 endpoint as:
GET /get-short-url/:
POST /build-URL-shortener
GET /redirect
High-Level Design
Before we dive deep into the high level design we need to make sure we estimate the capacity or as we say back of the envelope estimation.
Traffic & Throughput (Load)
Let's assume a daily active user on this as 100k daily active users.
Assuming 1 user has around 50 request so 100k DAU will be 100k * 50 = 5 * 1000000 = 5,00,00,000 req/day
So converting it into second = 50000000/86,400 = 579 request/second.
So total traffic with 100k active user the number of request after back of the envelope estimation is 579 request/second.
Let's check if the system is read heavy vs write heavy?
After 100:1 read to write ratio we know one thing for sure that this is going to be a read heavy system.
Database Design
Since we know the daily active users let's estimate the storage. We have 100k DAU and assume each payload cost is 200 bytes
We know one thing for every 100 read we have 1 write.
So imagine we have 100k DAU out of which we know one thing for sure some are read and some are write request.
Lets fetch the number of write request 1/101 * 579 i.e 6 write RPS Each request will be of 200 bytes
So 6 RPS will be 6 * 200 is 1200 bytes of data written in the db per second.
1200×86400×365=37,843,200,000 bytes ≈37.8 GB/year\approx 37.8 \text{ GB/year}≈37.8 GB/year
So per year we will need 38 GB per year and for 5 years mostly we will need 190 GB something.
This much DB storage is possible for storage.
Caching
Since this is a read heavy system we need to make sure that the frequently requested URL are stored in cache.
So we need to make sure that we are storing the top requested URLs to the cache.
We know one thing for sure that we have around 579 req/sec lets assume we have 30% of cached request.
So we can do one simple thing 579 * 0.3 = 174 RPS so 174 request per second
Each payload size is 200 bytes so 174 * 200 = 34800 req/sec on cache.
Imagine we have 10,000 objects and 30% are caches 3000 objects.
3000 * 200 bytes = 600000 bytes is the cache size requirement
Rate Limiting
We need to make sure we have a simple rate limiter to limit the rate on the api endpoints and apply back pressure on thre request which stops multiple request in a span of time or in window.
Detailed Component Design
We will have 2 service
- URL Shortener Service : This service will handle the load for the shortening of the URL.
- Redirect Service : We will have a simple service that will be redirecting the short URL to its's actual URL.