Out of Scope features:
Questions:
What to do when user requests the same long url again and again should we give the same generated short url or create a new short url.
REST
/url
[POST]
{ url:
alias?:
expiration_time?:
}
200 - url:{}
/url
[GET]
302 - url:{}
So the client sends the request of long url to the server and the server then stores it in DB with parameters:
1.Shorturl
2.Longurl
3.Expiration_time
4.Alias
5.Creation_time
So these values are stored in DB and when asked for the shorturl it will get fetched form the DB.
Here for fetching of the data from DB it may take more time so we can either add an index.We can add an index as this system is read heavy but not write heavy.Even though if we add the index it will still take more time to fetch the url in <100MS as the fetch time from memory is less than that of SSD.So we can place a cache in between so that the reads will be faster.But the main problem if we place a cache is we will be adding extra network hop but anyways the time we save from fetching the result from cache is more so the network hop addition time become negligible.So there is also another option to add CDN so that the POP will get updated and the data can be fetched directly from CDN but the problem with the CDN is that cost and complexity.CDN helps faster fetching by placing the data near to user location but the main drawback comes in the cost and complexity.So here we can use the write around cache which will check if the url is present in the cache if not then hit the DB and update the cache.
When performing the redirect we can also use 301 which stands for permanent redirection which mean browser will store this url and next time it directly reroutes to permanent url.So in this way it will be dfficult to track and debug so this will not be used mostly by many systems.
1.Client
2.Server
3.Url
So now we will deep down into the generation of short url .So we can use the generation of unique url by just taking some random texts and create the url but the collision may be there so along with the QPS addiitonal overhead regarding the redirects may happen too increasing the load .So we have to use some algorithm which will reduce the collision to near zero.So we can use base 62 encode algorithm which is 62^7 combinations can be used which will lead to less collision for which we can add a salted value too.So to avoid the complexities we can just use a redis counter 62 encoded algo which will be atomic and collision free so by this algorithm the collision count will be near zero where the counter will keep on increasing every time a url is requested.We can also infer one thing that this system is skewed read heavy compared to write heavy as more users will re-click the same url rather than creating more.So functional requirements are satisfied and this system if we see it should be highly available rather than consistent if we see according to the CAP theorem.So the activity we can see is around 100B * 100DAU = 10MB storage is required.So currently one server can handle the load to enable the partition tolerance is ahieved we can create the segregation of the db as read replica and write replica where the reads will hit the read replica and the write will go to write replica's. We can scale our system horizontally so that the load is not on one system.But there is a problem with scaling as synchronization will no longer work so a centralized redis instance to be used to store the counter.