Description - this api will take the redirectionURL to be redirected to and an optional short URL. if short url is not provided, it will be allocated by the system. the api will return a short URL
/GET - getURL
Request - shortURL
Response - redirectionURL
Description - this api will return the url to be redirected to
/GET - redirectURL
Request - redirectURL
Response - http status
Description - this api will redirect to the redirection url and will only return http status code.
High-Level Design
The High Level Design includes a client requesting the API to post or get. The request passes through load balancer.
Load balancer will help route requests to multiple servers and help in scaling.
For post requests, all will go the DB.
For get requests, servers will first check in the cache. If cache miss, then look up in the DB.
I will be discussing more in detailed component design.
Database here is a SQL database since we need strong consistency.
Identifier Generator - There will be a service which will pre generate unique identifiers and keep it in cache for immediate use.
Reconciliation Service - this service will reconcile urls not used for a long time and mark them as unused in the Database to be used for later.
Cache - will be used to store pre generated identifiers and also the frequently accessed redirection URLs against short URLs. (on the assumption that 80% of traffic is generated by 20% of the urls).
Detailed Component Design
Identifier Generator
This will pre generate short URL identifiers and save it in Database as unused.
The list of unused pre generated identifiers will be divided across multiple servers and marked as used in the database and stored in the server caches.
As soon as a request comes to generate url, a transaction is started to fetch a url from cache and save the redirection url for the fetched short url in the Database. This will save a request to DB for every request to fetch the short url first.
Reconciliation Service
There will be a TTL for pre generated URLs to remain in cache. once that TTL is passes, the entries will be evicted from the cache.
this service will mark all the unused short urls from cache (will check if not present in the cache, expired due to cache) as unused in the database so that they are pickup by the identifier service again.
Database
Database can further be replicated into read and write.
Read replicas can be used for redirection only.
Write replicas can be use to generate pre generated identifiers and also saving the redirection urls.