System requirements
Functional:
- Create a user account - log in and log out
- Create a paste - public/private - set the expiration
- Create a folder
- Users should be able to view their pastes
- public pastes should be visible to everyone with the link
- Display number of views per paste
- Users should be able to search pastes based on content, title or tags
- Handle inappropriate content on the paste
- Delete paste based on expiration
- Users should be able to collaborate on pastes, such as allowing multiple users to edit a paste.
- Users should be able to customize their account settings, such as profile information and notification preferences.
Non-Functional:
- Availability: The paste link should be highly available
- Security: Links should not be in serial order , should not be able to guess the next link
- rate limitation based on profile - guest, logged in user or PRO user
- Latency: latency should be minimised for users to access it faster
Capacity estimation
Lets assume 1 million total users
and daily active users 10% of the total users - 100,000
average of 5 pastes per day per user - 500,000 pastes per day
total of 100 pastes per user - 1 million users - 100 million pastes
Storage
Lets assume paste size is average of 100Kb,
Total storage required - 100Kb*100 million - 10 TB total
Throughput
100,000 DAU with 20,000 users simultaneously active
Max of 5 requests per minute
100,000 request per minute in peak
Considering read to write ratio 10: 1
10,000 write requests per minute - 166 TPS
90,000 read requests per minute - 1500 TPS
API design
- Create user - POST /user
Takes a request with username, email address, password, userType - guest/logged in/ PRO
returns a response with successful user creation
2.Read user - GET /user/userId
Takes a request and returns a response with all the user details username, email address, userType
3.Update user - PUT /user/userId
We can update the user with a different username, email address , userType through this API
4.Delete user- DELETE /user/userId
User can delete account with this API
5. Create a folder - POST /folder
Takes a request with userId, folder name, public/private, array of pasteIds and returns response with successful folder creation and folder Id.
6, Read a folder - POST /folder/folderID
Returns the reponse with below details - userId, folder name, public/private, array of pasteIds
7. Create a Paste - POST /paste
Takes a request with userId, paste content, public/private, expiration date, tags and returns a successful Paste creation response
8, Read a Paste - Post /paste/paseId
Returns the reponse with below details - userId, paste content, public/private, expiration date, tags
9, Update Paste - PUT /paste/pasteId
Can update PasteContent, public/private setting , tags and expiration date with this API
10, Delete a Paste - DELETE /paste/pasteId
Delete a paste..should not be visible to anyone
11. Search Paste - GET /paste
the request can contain search terms like paste name , tags or content filters
Database design
no Sql database like Cassandra / mongoDb for storing Users, folders, and Pastes since we the read requests are 10:1 compared to write requests
Use effective blob storage like Amazon S3 for storing Paste contents
Document database like Elastic search for storing Paste contents for search , index on fields like tags , paste name for easy search
High-level design
- Load Balancer - to balance among different servers for high availability
- Rate limiter - to limit DDos attacks and control the content based on user type
- API servers - To support CRUD operations on user, folder and paste.
- Inappropriate content remover - an async service to delete the paste based on content, interacts with elastic search for search operations
- Delete Paste - it can be a script that triggers timely deletes based on expiration date
- DB servers - Different servers based on description above.
- Cache - Cache for storing user , folder info, paste for the particular user
- CDN - content delivery network to handle requests according to the geography or region
Request flows
1, The request to create a Paste goes through load balancer to decide on the API server and rate limiter to make sure the requests are within user limit. Whenever a paste is created the pasteId is stored in cassandra and the content is stored in Amazon S3 , the corresponding link is stored in cassandra against the pasteId
2, Cache stores all the LRU user, folder and paste information. Cache can also be saved in users system to easily load all the data. CDN to manage cache according to the region
3, For reading service based on paste content, name and tags. We can use elastic search to filter the content
Detailed component design
Trade offs/Tech choices
Although the data is organised and can be stored in a sql database, made the decision to pick nosql because of read write ratio.
Failure scenarios/bottlenecks
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?