Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- Low latency
- High consistency
- Scalable to 1 lakh req / s
API Design
GET /urls/?url=
Status code: 301 redirect
{
"redirected_url":
}
POST
/urls/
{
"url": "long format url",
"short_code":
"expiry":
}
return {
"url":
}
High-Level Design
"""
• The design lacks a detailed component design. It does not deep dive into key components, their workings, scalability, tradeoffs, capacity, or relevant algorithms or data structures
• The design does not address high availability, low redirect latency, and horizontal scalability in the requirements
• The design does not provide a clear explanation of how it will handle many concurrent creates to avoid collisions, how it will make IDs hard to guess / enumerate, and how it will handle a generator outage or split-brain
"""
APIs flow:
For POST request,
can have unique constraint on short_code ,
we can have bloom filter or can use snowflake mechanism to create short codes across distributed system.
Req will first land on Api gateway post which be handled automatically by LB.
Then application layer can go through bloom filter to check if the short code exists or not or can use snowflake creation mechanism so that we are fully sure that it will never collide across any distributed system. Then it will row in DB with short code if provided otherwise create using base62.
for GET, since there will be millions of data,we can have a redis layer to keep recently x hrs created data,
this would be highly accessible and fast ..
The req with short code can easily access through redis first and if not present can be seen in DB with the short code since it will be indexed.
Availability point of view:
Also this shoudl be highly consistent instead of available to avoid conflict of short codes
For availability perspective, load balancer or service deployed on ec2 instances are horizontally scalable, hence can be scaled.
URL expiration:
For expiry of urls, we can have postgresql extension installed to mark field as inactive when its ttl is crossed
Though we would be using postgresql extension to expire rows, we can set a CRON as well to expire
Key Points summary-
URL can be shotened using base 62 encoding
We can have bloom filter to check if the url exists or not for the given short code while creation
We can use simple PostgresDB to store all the data with their expiry time as well
CRON will take care of that
Detailed Component Design
Availaibility perspective:
we can introduce CDN so that they can provide low latency across globe.
DB can be partitioned at geographic level to easily serve the req asap. DB replicas can have eventual consisteny across each global node.
Tradeoffs is only have CRON for expiration and what if the user tries to fetch data from another node across the globe.
But this req needs to be handled via CDN algorithm and eventual consistency can help .
So we need mix of both eventual consistency and strong consistency
Eventaul consistency for storing data across globe that is for master DB
But strong consistency across partitioned DB at globe level
Its possible that we might conflict at globe level but not at the partitioned DB level.
This partition key can be location wise