System requirements


Functional:

  • store text for some time
  • share text
  • manage access
  • sign up, log in



Non-Functional:

Availability : accessing text

Security : no one who don't have authority can't access it.

Performance : low latency for accessing the text from link




Capacity estimation

Write

DAU : 100M

text creation in a day : 1M


Read

Peak read per hour : 10M

RPS : 10M/60*60 = 10000K/3600=20K/sec


Storage

Maximum size of 1 text = 50K

storage size in a day = 50K*1M = 50GB



API design

POST /api/text/create

  • input : user id, text, time
  • output : success or fail

GET /api/text/shareableLink

  • input : user id, text id, permission
  • output : shareable link

POST /api/text/modifyPermission

  • input : user id, text id, permission
  • output : success or fail

POST /api/user/register

  • input : user id, password, user detail
  • output : success or fail

POST /api/user/login

  • input : user id, password
  • output : success or fail


Database design

Text

  • we can choose blob storage or rdbms or noSQL depends on scalabiltiy. for scalabiltiy we would choose NoSQL like mongoDB

Customer, link

  • we can choose rdbms for structured data







High-level design

Server store the paste to DB and redis with TTL which is from user definded expiredTime.


for fault tolerance and distributing the request efficiently we introduce CDN. If celebrity create the link for paste service will store it to CDN. And people can access to content event without reaching our server.


If people access server with link which is not in CDN. server check their permission and return the result from redis. It can reduce the load from db.


we don't need to manage the expired time for the cache since it has ttl. still we need to manage it for db. Cron job will fetch expiredtime, text id regularly. and delete it.



Request flows






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

delete expired paste vs archive

deleting expired paste has beneficial for saving cost for storage. But it lose the opportunity for anaylize and improvement for future and future support for new user's requirements.

Archiving expired paste increase the cost for storage. But we would choose this for future.


MySQL vs MongoDB vs S3

if we choose deleting expired paste, MySQL is enough for this service. But We will choose hybrid. for not expired paste we would use MySQL and for expired paste we would use S3 after archiving




Failure scenarios/bottlenecks

we would replicate the server since it is stateless service.


DB

Multi AZ setup. and incremental backup


redis

point in recovery


cron job



Future improvements

  • edit after sharing
  • parallel editing with people