System requirements


Functional:

  1. Create a user account - log in and log out
  2. Create a paste - public/private - set the expiration
  3. Create a folder
  4. Users should be able to view their pastes
  5. public pastes should be visible to everyone with the link
  6. Display number of views per paste
  7. Users should be able to search pastes based on content, title or tags
  8. Handle inappropriate content on the paste
  9. Delete paste based on expiration
  10. Users should be able to collaborate on pastes, such as allowing multiple users to edit a paste.
  11. Users should be able to customize their account settings, such as profile information and notification preferences.




Non-Functional:

  1. Availability: The paste link should be highly available
  2. Security: Links should not be in serial order , should not be able to guess the next link
  3. rate limitation based on profile - guest, logged in user or PRO user
  4. Latency: latency should be minimised for users to access it faster




Capacity estimation


Lets assume 1 million total users

and daily active users 10% of the total users - 100,000


average of 5 pastes per day per user - 500,000 pastes per day


total of 100 pastes per user - 1 million users - 100 million pastes


Storage

Lets assume paste size is average of 100Kb,

Total storage required - 100Kb*100 million - 10 TB total


Throughput


100,000 DAU with 20,000 users simultaneously active

Max of 5 requests per minute


100,000 request per minute in peak

Considering read to write ratio 10: 1


10,000 write requests per minute - 166 TPS

90,000 read requests per minute - 1500 TPS




API design


  1. Create user - POST /user

Takes a request with username, email address, password, userType - guest/logged in/ PRO

returns a response with successful user creation


2.Read user - GET /user/userId

Takes a request and returns a response with all the user details username, email address, userType


3.Update user - PUT /user/userId

We can update the user with a different username, email address , userType through this API


4.Delete user- DELETE /user/userId

User can delete account with this API


5. Create a folder - POST /folder

Takes a request with userId, folder name, public/private, array of pasteIds and returns response with successful folder creation and folder Id.


6, Read a folder - POST /folder/folderID


Returns the reponse with below details - userId, folder name, public/private, array of pasteIds


7. Create a Paste - POST /paste

Takes a request with userId, paste content, public/private, expiration date, tags and returns a successful Paste creation response


8, Read a Paste - Post /paste/paseId

Returns the reponse with below details - userId, paste content, public/private, expiration date, tags


9, Update Paste - PUT /paste/pasteId


Can update PasteContent, public/private setting , tags and expiration date with this API


10, Delete a Paste - DELETE /paste/pasteId


Delete a paste..should not be visible to anyone


11. Search Paste - GET /paste

the request can contain search terms like paste name , tags or content filters




Database design


no Sql database like Cassandra / mongoDb for storing Users, folders, and Pastes since we the read requests are 10:1 compared to write requests


Use effective blob storage like Amazon S3 for storing Paste contents


Document database like Elastic search for storing Paste contents for search , index on fields like tags , paste name for easy search



High-level design


  1. Load Balancer - to balance among different servers for high availability
  2. Rate limiter - to limit DDos attacks and control the content based on user type
  3. API servers - To support CRUD operations on user, folder and paste.
  4. Inappropriate content remover - an async service to delete the paste based on content, interacts with elastic search for search operations
  5. Delete Paste - it can be a script that triggers timely deletes based on expiration date
  6. DB servers - Different servers based on description above.
  7. Cache - Cache for storing user , folder info, paste for the particular user
  8. CDN - content delivery network to handle requests according to the geography or region





Request flows


1, The request to create a Paste goes through load balancer to decide on the API server and rate limiter to make sure the requests are within user limit. Whenever a paste is created the pasteId is stored in cassandra and the content is stored in Amazon S3 , the corresponding link is stored in cassandra against the pasteId


2, Cache stores all the LRU user, folder and paste information. Cache can also be saved in users system to easily load all the data. CDN to manage cache according to the region


3, For reading service based on paste content, name and tags. We can use elastic search to filter the content



Detailed component design






Trade offs/Tech choices


Although the data is organised and can be stored in a sql database, made the decision to pick nosql because of read write ratio.




Failure scenarios/bottlenecks



Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?