System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
1 Users can add text to the document
2 Users can share the document that they have created
Non-Functional:
List non-functional requirements for the system...
System should be highly scalable
System should have low latency
Capacity estimation
Estimate the scale of the system you are going to design...
As the number of writes is far less than number of reads so we will calcuate the QPS both for read and write operation seperately
lets assume each users performs 1000 reads in a day
then QPS for read= 10^4*1000=10^7
Lets' round of 86400 value to 10^5 so read qps =100
similarly let's assume user perform 100 write operation
then read transactions = 10^4*100 =10^6
considering 10^5 to be number of seconds in a day then
read qps= 10^6/10^5=10
API design
Define what APIs are expected from the system...
v1/document-url - This will be a post call as the resource is created once a user
v1/ update-document - This call will update the exisiting document for the user
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
As the system is read heavy so we will user relational database
we will have only one user table which will have following attributes
UserId, Username, documentId
then we will have a document table that will will
documentId, targetUrlId
then finally we will have targetUrl table that will store the targetUrl
and will have the following attriburtes
targetUrlId ,targetUrl
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
The users makes a call to pastebin service a call is forwarded to one of the load balancers which directs the call to one of the api server the api server in turn pulish the call to a message queue from the message queue the hashfunction is used to calculate the hash of the url after calculating the hash an entry is made into the database. Finally the file is pused to s3
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
The users makes a call to pastebin service a call is forwarded to one of the load balancers which directs the call to one of the api server the api server in turn pulish the call to a message queue from the message queue the hashfunction is used to calculate the hash of the url after calculating the hash an entry is made into the database.Finally the file is pused to s3
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
We will use a message queue such as rabbit mq so that api server, hash function and database can function asynchronously an scale independently . For the database we will be using mysql with single leader replication such that all the writes are done on the master and all the reads are served by the replica but the problem with this apporach is consistency which can be guaranteed by read quoram for scaling the databse we will shard the database based on the so based on the unique key the data is served
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
1 As we are using mysql for the database the writes will be a little slower.
2) We will be using sha-256 instead of md5 as changes of collision wil be less
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
If the database leaser goes down then the most consisten read replica will be promoted to leader and reads will be served from that leader but here can be chances that leader which was down has recovered and there is still the elected leader accepting the reads.
For other failure scenarios we will user grafana with prometheus for detecting any changes.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
The file size which is used in pastebin can be increased