Requirements
Functional Requirements:
- Create a short URL for a given long URL.
- Return the long URL associated with a given short URL.
Non-Functional Requirements:
- Latency for returning long URL associated with give short URL should be very low, as user should not feed any kind of lag when he clicks on short url
- We are going with this design philosophy that once a short url is created , we can't edit anything in it like at what time it should expire, what is long url for that short url, etc.
API Design
GET(short-url)-> long-url
POST(long-url)-> short-url
High-Level Design
Two Major APIs
- Get API - we will get a request for short url, with short identifier in it. this api will not be authenticated
- POST Api - this will be authenticated api( user should first login to create a short url). We will be provided , user data , date and time, and long url in this api. And in return we will provide a short url in response.
Detailed Component Design
Creation Layer( for creating a short url for provided long url)
Once we will get a long url , we will generate a alphanumeric string with at most 7 character and then check if we have this mapping in a bloom filter , it will tell with 100% accuracy that this short identifier is not present. If it says that its present then we can check direct scylladb. Once our system confirms that this is not present , we will take a lock on this short identifier and if we get the lock then we can register it, then release the lock. If lock not found ,then we will create a key again and repeat this process.
Serving Layer( for short-url to long-url)
first we will check if this short identfier is present in bloom filter or not, if not present we will return 404, else we will check it.
We will have 3 layers where we will store data
- in memory cache, we will have LFU based evacuation policy in this so that only hot keys will be cached here
- Redis/Dragonfly -> In this we will store key value pair of short url to its long url (Whenever a key is accessed we will update its TTL ), so if a key has not been accessed for TTL period of time , then it will not present inside a cache
- Scylla DB:- If not found in redis as well , then we will search it and if found then we will first cache it in redis then in memory and then return it. If not found here also then 404.