System requirements
Functional:
Users
- Web Interface
- API Access
Users can paste the text snippets for short period on time - Casual Users
Character limit is 20 k
Size of the paste per user - 200 MB
Automatically deleted after 30 days
Non-Functional:
Total users - 100,00 users
Daily active users = 10,000
Availability - 99.9 %
Latency - 500 ms
Key Business KPI's:
- Number of shared snippets created
- Number of Daily Active Users
Capacity estimation
Lets first calculate the bandwidth :
QPS = 100,000 * 5 / 10,000 = 5 pastes/second
Peak QPS = 2 * QPS = 10 pastes/second
Read to write ratio is 5:1
So read QPS = 10 reqs/second
Storage
10,000 * 5 * 200 *10^6 = 10 TB / day
For 30 days
300 TB
Cache Size = 10 TB * .2 = 2 TB
API design
Create paste
/api/v1/create_paste - POST method - 201 OK
payload{
user_id:
paste_name:
expiry:
content:
}
Get Paste
/api/v1/paste_id -- 200K if found , 404 if not found
{
paste_id
}
Delete Paste
/api/v1/paste_id
{
paste_id
}
Database design
We will need two tables
One for storing user data and one for storing metadata for the pastes themselves
use Table
user_id, name, email , createdAt
paste Table
paste_id, user_id, paste_name, expiryDate
The data itself suits well for a noSQL data type mostly because we are not looking for strong ACID properties and also the data is not very relational. Also its very easy to scale as we add expand more and more to new countries
For storing the paste itself we will be using a blob store like S3 as they are well suited for this kind of purpose.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Usecase 1: Creating a paste
The user visits the website and pastes the content and give it a paste name, the request then is passed onto the LB which then picks up an app server which hosts the API, we call create paste API, the app server stores the metadata in the NoSQL DB and stores the paste in the S3 storage system
Usecase 2:
When the user accesses the paste then enter the shortURL , we then call the GET API which gets the metadata from the DB and file from the S3 and returns the paste and the data, if its not found we return a 404 for API and in the UI we can show an error.
Detailed component design
We will talk about how we generate the alias for the paste, we will generate the alias using a key generator service which generates an id and stores it in the DB. The keys are pregenerated and stored in the DB and every time a request comes to the app server , it will then call the key gen service which will fetch the key and mark it as used in the key gen.
Trade offs/Tech choices
I have used S3 as the storage as it gives us a lot of features that will help scale the system and also help us achieve some of the KPI's mentioned like
- As the system becomes popular we will need more storage S3 offers infinite storage , we can setup rules for autocleanup to reclaim space, we can also have more durability of the data, there is also versioning which can help us with collaborative editing.
- NoSQL DB means its easy to update the new schema without having to run expensive migrations which may result in downtime. Its also easy to perform horizontal scaling. We will be not having strong ACID.
- Since we are talking about low latency NoSQL database and S3 support eventual consistency to support this.
- We will be using redis as a cache for speeding as it supports sorting and access to data structures.
- Implementt background processes to ensure data consistency between NoSQL and S3, implement some sort of atomic operations between S3 and noSQL so that they can be rolled abck if necessary.
Failure scenarios/bottlenecks
- The NO SQL Cassandra Database is a single point of failure.
- The key generator DB is also a single point of failure.
- The Key Gen service and application server are also under a single point of failure.
- The API will need to be rate-limited to prevent abuse by developers.
Future improvements
- We will need a secondary replica of the No SQL DB and Key generation DB so that in case of the primary failures the secondary can be promoted to primary.
- We will implement autoscaling to work with the increased load.
- We will need monitoring and alerting to be in place to check the health of the service.
- We will need Blob storage in multiple Availability Zone to increase data durability .
- We will use prometheus for monitoring .
- We will also have cache infront of key gen DB to improve the efficiency of key fetching.
- We will also have asyncronus replication between primary and seconday DB to esnure consistency between the primary and secondary.