Requirements


Functional Requirements:


  • Create a short URL for a given long URL.
  • Return the long URL associated with a given short URL.



Non-Functional Requirements:


  • Low latency
  • High consistency
  • Scalable to 1 lakh req / s


API Design

GET /urls/?url=

Status code: 301 redirect

{

"redirected_url":

}


POST

/urls/

{

"url": "long format url",

"short_code": ,

"expiry":

}


return {

"url":

}



High-Level Design


APIs flow:


For POST request,

can have unique constraint on short_code ,

we can have bloom filter or can use snowflake mechanism to create short codes across distributed system.

Req will first land on Api gateway post which be handled automatically by LB.

Then application layer can go through bloom filter to check if the short code exists or not or can use snowflake creation mechanism so that we are fully sure that it will never collide across any distributed system. Then it will row in DB with short code if provided otherwise create using base62.




for GET, since there will be millions of data,we can have a redis layer to keep recently x hrs created data,

this would be highly accessible and fast ..

The req with short code can easily access through redis first and if not present can be seen in DB with the short code since it will be indexed.

It will give 301 as status code which will be locally cached in browser hence it will be accessible easily



Availability point of view:

Also this shoudl be highly consistent instead of available to avoid conflict of short codes

For availability perspective, load balancer or service deployed on ec2 instances are horizontally scalable, hence can be scaled.



URL expiration:

For expiry of urls, we can have postgresql extension installed to mark field as inactive when its ttl is crossed

pg_ttl_index can be used to expire short codes


Key Points summary-

URL can be shotened using base 62 encoding

We can have bloom filter to check if the url exists or not for the given short code while creation

We can use simple PostgresDB to store all the data with their expiry time as well





Detailed Component Design



Availaibility perspective:

Since we have redis layer on top of it , it can scale to 1m req / s

Also cannot induce CDN since we want strong consistency



Tradeoffs:

No tradeoffs since we have extension in place to handle expiration.

Snowflake package to handle unique short code generation


Concurrency handling:

Concurrent calls will not usually collide since in packages like snowflake it used multiple parameters to create a hash Id