Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- Low latency
- High consistency
- Scalable to 1 lakh req / s
API Design
GET /urls/?url=
Status code: 301 redirect
{
"redirected_url":
}
POST
/urls/
{
"url": "long format url",
"short_code":
"expiry":
}
return {
"url":
}
High-Level Design
APIs flow:
For POST request,
can have unique constraint on short_code ,
we can have bloom filter or can use snowflake mechanism to create short codes across distributed system.
Req will first land on Api gateway post which be handled automatically by LB.
Then application layer can go through bloom filter to check if the short code exists or not or can use snowflake creation mechanism so that we are fully sure that it will never collide across any distributed system. Then it will row in DB with short code if provided otherwise create using base62.
for GET, since there will be millions of data,we can have a redis layer to keep recently x hrs created data,
this would be highly accessible and fast ..
The req with short code can easily access through redis first and if not present can be seen in DB with the short code since it will be indexed.
Availability point of view:
Also this shoudl be highly consistent instead of available to avoid conflict of short codes
For availability perspective, load balancer or service deployed on ec2 instances are horizontally scalable, hence can be scaled.
Also we can introduce CDN so that they can provide low latency across globe.
URL expiration:
For expiry of urls, we can have postgresql extension installed to mark field as inactive when its ttl is crossed
Though we would be using postgresql extension to expire rows, we can set a CRON as well to expire
Key Points summary-
URL can be shotened using base 62 encoding
We can have bloom filter to check if the url exists or not for the given short code while creation
We can use simple PostgresDB to store all the data with their expiry time as well
CRON will take care of that
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.