System requirements
Functional:
- A user can create a paste
- A user can edit a paste
- A user can retrieve their pastes
- A user can delete a paste
Non-Functional:
- Highly available, users should be able to access their pastes at any given time
- Reliable, pastes should be stored for as long as the user set for
- Low latency on retrieving the text
Capacity estimation
1m Total users
100k DAU
100K writes per day -> 1 write per second
10 reads per second
10:1 read-write ratio
max text size : 500KB
API design
GET /pastes?userId
This will retrieve the list of paste names for a given user
GET /pastes/:pasteId
This will retrieve the paste data for a given paste id including its text
POST /pastes
{ userId, pasteName, text, expiration}
This will create the new post and store it in the DB
PUT /pastes/:pasteId
This will update a given paste
DELETE /pastes/:pasteId
This will delete a specified paste
Database design
Paste
pasteId
name
filePath
expiration
userId
creationTimestamp
User
userId
password
High-level design
- API Gateway - We can leverage to handle the authentication of the user. This will help to identify and authorize the user trying to make any changes to their paste.
- Paste Retrieval Service - this will be responsible for any retrieval of the paste data / metadata from the database
- Paste Management Service - this will be responsible for any writes or updates of the pastes
- Expiration Service - this will be a batch process that will fetch the expired pastes from the relational database and delete them.
- Object Store - This will store the content of the paste
- Relational Database - This will store the metadata of the the paste. Since strong consistency is important for this use case as there shouldn't be a case where the user does not see their latest pastes when they open their dashboard, we will use a relational db which has ACID properties.
Request flows
GET /pastes?userId
This will first go to the API gateway to authenticate the user, then routes to the Paste Retrieval Service, if the user is authorized, the service will pull the pastes metadata from the relational database and return that to the user. The user will be able to view the list of paste names and their respective metadata.
GET /pastes/:pasteId
This will first go to the API gateway to authenticate the user, then routes to the Paste Retrieval Service, if the user is authorized, the service will pull the pastes metadata from the relational database and then retrieve the paste content from the object store using the filePath property in the database. The user will be able to view the paste metadata and the content.
POST /pastes
{ userId, pasteName, text, expiration}
This will first go to the API gateway to authenticate the user, the routes to the Paste Management Service. The service will write the paste content to the object store then write the paste metadata to the relational database. Once its done it will return an acknowledgement of the successful paste to the user.
PUT /pastes/:pasteId
This will first go to the API gateway to authenticate the user, then routes to the Paste Management Service, if the user is authorized, the service will update the pastes metadata and paste content. Once its done it will return an acknowledgement of the successful update to the user.
DELETE /pastes/:pasteId
This will first go to the API gateway to authenticate the user, then routes to the Paste Management Service, if the user is authorized, the service will delete the pastes metadata and paste content. Once its done it will return an acknowledgement of the successful delete to the user.
Detailed component design
For the paste retrieval service, we can create an index of the paste table by the user id in order to quickly find and retrieve pastes for a given user. When we query for all pastes of user, we will select all pastes for the given user id. When we query for a specific paste, we will leverage the userId index or index the pasteId itself to make that query fast as well as a user can have plenty of pastes.
For the paste management service, as the text content may be very large, we do not want to transfer the large amount of data over the network several times, we can possible add some compression logic to the paste management service before writing to the object store. This may help in the performance as it will be a smaller load for writing and retrieving pastes.
For the expiration service, this can be a nightly job that queries the database for the expiring pastes for that day. We can index on the expiration timestamp in order to quickly retrieve expiring pastes. It will then delete those pastes from the relational database then delete those from the object store.
Trade offs/Tech choices
Choosing to use a relational database, we are essentially prioritizing strong consistency, where any replicas need to be fully updated before the write is considered complete. This can add to the latency of writes, but will ensure that the pastes users see are always up to date.
Failure scenarios/bottlenecks
The API Gateway may be a single point of failure here
The relational database can become a single point of failure
Future improvements
We can add redundant gateways to ensure high availability.
We can create a master-slave replication pattern where we have one replica handle writes and have several read replicas. This way it will be highly available. We will ensure strong consistency by having the replicas updated before accepting new reads.