Codemia | Master System Design Interviews Through Active Practice

My Solution for Designing a Simple URL Shortening Service: A TinyURL Approach with Score: 8/10

by flare_xylo777

System requirements

Functional:

The system should be able to generate a tinyURL for a given URL
The system should be able to re-direct to the original URL
monitor which one is a popular URL which is being shortened

Non-Functional:

Low latency, there should be no latency , it should be easy to create a shortURL and seemless to re-direct to the full URL
Scalable
Highly available -- the system should be able to function even where are failures, to ensure fault-tolerance and high reliability; here eventual consistency is okay; but availability is very important

Capacity estimation

I am assuming a large scale system. Taking the DAU to be around 1M DAU.

Assuming the reads to write ratio is around 5:1 ==> 5 reads per write

Therefore this is a read heavy system

API design

CREATING A TINYURL:

POST /createTinyUrl --

{userId, timestamp, bigURL}

RESPONSE 200 success with data

{

messaage: success

result:

}

REDIRECTING A TINY URL

GET --

RESPONSE 300s

redirect back to the original URL

MONITOR

GET /popularURLs?minHitCount=1000

RESPONSE

[

{ url: bigURL1},

{url: bigURL2}

]

Database design

I will have one DB : TinyURL DB with tables like

URL table : UserId, TimeStamp,BigURL, TinyURL, count ==> roughly 300 bytes/record

Every redirect we will just increase the count ==> 4 bytes ==> 20 bytes * 10 ^ 6 = 20 MB,/day then 600 MB/month which is 7.2GB/year roughly 20 GB/year to account for peak times
Every write we will write the full record say 300 bytes * 10^6 = 300 MB/Day, roughly 3 GB/ month and 36 GB /year ==> 360 GB for peak usage
this is total 400 GB/ year

Key DB: Available Hashes ==> roughly 8bytes/ hash ==> 8 bytes* 10^6 hashes ==> 8MB; we will reuse them as well

For keys DB we can use (in-memory DB) Redis and the TinyURL DB can be a No-SQL DB (like DyanmoDB/ mongoDB) as eventual consistency is a good trade-off here while guaranteeing high availability and scalability

High-level design

Client goes through the AWS managed LoadBalancer layer to be able to distribute the load throughout

The request then hits the TinyURL service which then interacts with the storage layer to do the necessary operation

There is monitoring component to periodically keep pulling the popular URLs

Request flows

Creating the TinyURL : Basically the client's request goes through the load balancer/gateway to reach the TinyURL service then service responds back to the user with the tinyURL via the LB layer
Redirecting the URL: When the client hits the tinyURL the service responds with the actual URL via redirection message
Monitoring: This component constantly polls from the TinyURL service at the /popularURLs endpoint to periodically monitor the topURLs

Detailed component design

creating URL design
1. We will pre-generate the hashes and store them in a key DB
2. when a request for creating a tinyURL comes we just use it and remove it from the key DB
3. Periodically we will use a worker service to periodically remove the stale keys from the URL DB and put it back to the key DB so that keys can be reused
redirecting URL
1. Here, when the user hits the tinyURL it responds with a GET response of 300 to redirect back the original URL

Trade offs/Tech choices

Using in-memory DB for Key DB is very good choice because of low latency and high availability of the in-memory DBs
Also using No-SQL DBs like DynamoDB we will be able leverage scalability and high availability while trading off on consistency because this system can eventual consistency

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

When one part of the system goes down the next replica takes care of the processing without bringing the complete system down
For example, in one region if service goes down the LB will re-route the request to the next nearest endpoint to make sure the system caters to users inspite of one region down. In those scenarios the latency may be a little bit higher but that is okay compared to the whole system crashing

Future improvements

Definetly caching, I want to add caching to the storage layer to fasten the looks
And periodically have one worker service to keep the storage layer and the cache in sync