System requirements
Functional:
Url should be unique
Given a long url, system should give a short url
system should redirect to the original url given the short url
Non-Functional:
Highly available
Looks like an AP system than consistency.
count the number of times the url is accessed
Capacity estimation
Per Url - 200BYTES and user metadata 100 bytes
replicas - 3
total metadata = ~1000 bytes per day
400000 bytes per year
a million urls
10^6*4*10^5 = 4*10^11
5 years storage = 20*10^11 = 2 *10^12 = 2TB
API design
Define what APIs are expected from the system...
shortenUrl(longurl, userID, timestamp)
getUrl(shortUrl)
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
its definitely a read heavy state,
Need a key value store
to optimize storage, we can store the short url top domain in another column
The issue with noSQL though is that they might not be consistent and might cause conflicts so we can either go with sql assuming the shortening algorithm will give a unique url always
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...
Database - Use mongodb
Short URL generator - hashing algorithm - base 68 encoder,
loadbalancer
cache
replication
rate limiting
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Overall, the request first comes to the shorturlgenerator that shortens the url using a sequencerid and encodes that to a base 58 ecncoding.
When the redirection request comes, the request first goes to the loadbalancer which routes the request to the api server which searches the cache first and then the db if not found to get the long url and redirects to the long url
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
the main component to be focused here is the urlshortner
When the request for url shortner comes, it first hashed and then encodes to base 58. Base 58 encoding is for readability. But this can cause collisions.
So we need a very strong hashing algorithm like murmur, google to avodi collisions.
Also, to avoid collisions, we can check the db if the hash already exists but again this might slo us down so we shard the db with a range function and also can use bloom filters
Scaling the db and caching are important here
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
used mongo as its nosql and gives high throughput and locking emchanisms to avoid collisions
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?