System requirements
Functional:
List functional requirements for the system (Ask interviewer if stuck)...
1.Given a longer URL system should return corresponding short URL
2.Given short URL system should redirect to corresponding longer URL
3.System should not have tiny URL collision.
4.Possible characters to be included in the Tiny/Short URL
Non-Functional:
List non-functional requirements for the system...
1.How many URL's expected to be stored in system.= 1 billion
2.Average URL shorting request per second.=100
3.Average URL redirect request per second.=1000
4.Latency and throughput of write request.=Latency 100ms, Throughput 100
5.Latency and throughput of redirect request. =Latency 10ms, Throughput 1000
6.Average Long URL length= 300 byte
7. Average short URL length = 15 byte
Capacity estimation
Estimate the scale of the system you are going to design...
Storage capacity short url= 1billion * 15 byte(average size) =15 gb
Storage capacity of long url= 300 * 1billion = 300 Gb
API design
Define what APIs are expected from the system...
API 1
HTTP verb: POST
HTTP payload: {url :""}
HTTP header: userInfo:"", content-type, accept-type, access_token:""
Res:
{tinyURL:""}
API2
HTTP verb: DELETE
HTTP payload: {tinyURL :""}
HTTP header: userInfo:"", content-type, accept-type, access_token:""
Res:
API3:
HTTP verb: GET (TinyURL)
response:
redirect to Long(URL)
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Table1
USER {ID(Primary key), user_name}
URLMAPPING{ID(primary key),tiny_url, long_url,userid(Forgine key)}
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...
I need following component
- Client
- API service
- Data access layer
- Auth and authorization
- Database
- Cache
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
- Data base can have more read replicas to handle the read request. But writes can go through primary which replicates to other read replicas.
- All the services except DB is stateless so can be scaled horizontally.
- In Write are faster than the processing capacity of ProcessingService we have bring in a Message queue (kafka) to handle back pressure.
- ProcessingService can use Base58 algorithm to hash the request which result in 2^ 48 possible URL values.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
- I could have used Base64 system but it would result in a-z, A-Z,0-9,_,/,= to keep it readable used only a-z, A-Z,0-9 AND SOME repeating char line 0 and o etc
- Writes are less compared to read so read replicas are added to scale reads.
- Cache used to cache frequent redirected URL. Caching strategy will be to cache on second hit. Cache will be distributed.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
- If writes are enormus it will cause high latency and reduced throughput in write API's
- If number of URL cross 2^58 then we need to come up with new hash approach to achive scale in new URL request.
- Primary DB failure and switchover may cause latency in writes.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
- Come up with new approach which can handle higher writes