System requirements


Functional:

Url should be unique

Given a long url, system should give a short url

system should redirect to the original url given the short url




Non-Functional:

Highly available

Looks like an AP system than consistency.

count the number of times the url is accessed




Capacity estimation

Per Url - 200BYTES and user metadata 100 bytes

replicas - 3


total metadata = ~1000 bytes per day


400000 bytes per year


a million urls


10^6*4*10^5 = 4*10^11


5 years storage = 20*10^11 = 2 *10^12 = 2TB




API design

Define what APIs are expected from the system...


shortenUrl(longurl, userID, timestamp)

getUrl(shortUrl)





Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


its definitely a read heavy state,

Need a key value store

to optimize storage, we can store the short url top domain in another column


The issue with noSQL though is that they might not be consistent and might cause conflicts so we can either go with sql assuming the shortening algorithm will give a unique url always



High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...



Database - Use mongodb

Short URL generator - hashing algorithm - base 68 encoder,

loadbalancer

cache

replication

rate limiting



Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


Overall, the request first comes to the shorturlgenerator that shortens the url using a sequencerid and encodes that to a base 58 ecncoding.


When the redirection request comes, the request first goes to the loadbalancer which routes the request to the api server which searches the cache first and then the db if not found to get the long url and redirects to the long url


Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


So


Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?