System requirements


Functional Requirements:

-> Given a short URL we want to return a long URL

-> Redirect a long URL to the correct Short URL

-> Unpredictable Short URL

-> Edit / Delete the Created Short URLs

-> Expiration for the Created Short URLs


Non-Functional:

-> Highly Scalable

-> Highly Available

-> Read to Write Ratios 1 : 100

-> High Consistency




Resource Estimations:


Server Requirements:

Estimations:

->200 M Long URLs received every Month

-> Redirects = 200M * 100 = 20 * 10 ^ 9

-> DAU = 100M

Servers Needed = QPS / Typical Server QPS

= 100 * 10 ^6 / 64000 = 100 * 10 ^ 3 / 64

~= 1600 Servers


Storage Requirements:

-> 1 URL is 500 Bytes in the DB

-> 2400 M URLs Every Year

-> 2400 * 10 ^6 * 500 = 120 * 10 ^10 ~= 2.4 TB / Year so ~ 12TB / 5 Years




Api Design


1) get_short_url (userID, long_URL, expiry)

-> Returns a json containing the mapped short URL in the response


2) redirect_long_url (userID, shortURL):

-> Returns the mapped long URL


3) edit_short_url (userID, old_longURL, newLongURL)

-> Does not return anything, Just changes the mapping in the Database from the Old URL to the New URL


4) delete_mapping (userID, longURL)

-> Deletes the mapping in the Database



Basic Building Blocks that we would need and also look a basic System Design


-> Load Balancers

-> Rate Limiter

-> Databases

-> Cashing

-> Sequence

-> Encoder




Work Flows:


store/edit/delete:

-> Load Balancer Receives the request from the clients

-> Load Balancers talks to the WebSerers.

-> Check if the Cache already has the data if yes, return back the data. If not, talk to the application Servers and communicate with the Database to retrieve/ store the data in the database





DB Selection:


-> Since the data is structured and also the system is quite Read Heavy, a SQL databases would be a good selection here. MySQL, PostgreSQL, etc.


-> For Caching, We need to store most used longURL-shortULR Mappings, A Redis/ Mongo Can be of great Help.


-> For Reducing the Latency and increasing the Performance of our Systems, We should definitely consider of Using CDNs so that we are as close as possible to our Customers


Detailed Design Workflows:


-> Client uses the Load Balancers to talk to the Service.

-> We make use of Rate Limiters to make sure that our system is not abused.

-> For creating a shortURL From the Given LongURL following is the flow:

-> A sequencer generates a unique 64 bit integer

-> The Encoder uses Base58 encoding to encode this sequence into format which is not Human Redable and Unpredictable. We will use Base58 instead of Base64 to remove some items which can be confusing (like, I, j, I, J, etc)

> We store this mapping in our MySQL database and return the created Short URL to the Customer.


-> For Redirection:

-> The Service receives a request from the Customers, checks the Cache, if not found gets the data from the DB. The application Server also stores this data in the Cache for the future Use so that we do not have to go to the DB always.