Design Pastebin - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

1 Users can add text to the document

2 Users can share the document that they have created

Non-Functional:

List non-functional requirements for the system...

System should be highly scalable

System should have low latency

Capacity estimation

Estimate the scale of the system you are going to design...

As the number of writes is far less than number of reads so we will calcuate the QPS both for read and write operation seperately

lets assume each users performs 1000 reads in a day

then QPS for read= 10^4*1000=10^7

Lets' round of 86400 value to 10^5 so read qps =100

similarly let's assume user perform 100 write operation

then read transactions = 10^4*100 =10^6

considering 10^5 to be number of seconds in a day then

read qps= 10^6/10^5=10

API design

Define what APIs are expected from the system...

v1/document-url - This will be a post call as the resource is created once a user

v1/ update-document - This call will update the exisiting document for the user

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

As the system is read heavy so we will user relational database

we will have only one user table which will have following attributes

UserId, Username, documentId

then we will have a document table that will will

documentId, targetUrlId

then finally we will have targetUrl table that will store the targetUrl

and will have the following attriburtes

targetUrlId ,targetUrl

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

The users makes a call to pastebin service a call is forwarded to one of the load balancers which directs the call to one of the api server the api server in turn pulish the call to a message queue from the message queue the hashfunction is used to calculate the hash of the url after calculating the hash an entry is made into the database. Finally the file is pused to s3

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

We will use a message queue such as rabbit mq so that api server, hash function and database can function asynchronously an scale independently . For the database we will be using mysql with single leader replication such that all the writes are done on the master and all the reads are served by the replica but the problem with this apporach is consistency which can be guaranteed by read quoram for scaling the databse we will shard the database based on the so based on the unique key the data is served

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

1 As we are using mysql for the database the writes will be a little slower.

2) We will be using sha-256 instead of md5 as changes of collision wil be less

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

If the database leaser goes down then the most consisten read replica will be promoted to leader and reads will be served from that leader but here can be chances that leader which was down has recovered and there is still the elected leader accepting the reads.

For other failure scenarios we will user grafana with prometheus for detecting any changes.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

The file size which is used in pastebin can be increased