System requirements
Functional:
- store text for some time
- share text
- manage access
- sign up, log in
Non-Functional:
Availability : accessing text
Security : no one who don't have authority can't access it.
Performance : low latency for accessing the text from link
Capacity estimation
Write
DAU : 100M
text creation in a day : 1M
Read
Peak read per hour : 10M
RPS : 10M/60*60 = 10000K/3600=20K/sec
Storage
Maximum size of 1 text = 50K
storage size in a day = 50K*1M = 50GB
API design
POST /api/text/create
- input : user id, text, time
- output : success or fail
GET /api/text/shareableLink
- input : user id, text id, permission
- output : shareable link
POST /api/text/modifyPermission
- input : user id, text id, permission
- output : success or fail
POST /api/user/register
- input : user id, password, user detail
- output : success or fail
POST /api/user/login
- input : user id, password
- output : success or fail
Database design
Text
- we can choose blob storage or rdbms or noSQL depends on scalabiltiy. for scalabiltiy we would choose NoSQL like mongoDB
Customer, link
- we can choose rdbms for structured data
High-level design
Server store the paste to DB and redis with TTL which is from user definded expiredTime.
for fault tolerance and distributing the request efficiently we introduce CDN. If celebrity create the link for paste service will store it to CDN. And people can access to content event without reaching our server.
If people access server with link which is not in CDN. server check their permission and return the result from redis. It can reduce the load from db.
we don't need to manage the expired time for the cache since it has ttl. still we need to manage it for db. Cron job will fetch expiredtime, text id regularly. and delete it.
Request flows
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
delete expired paste vs archive
deleting expired paste has beneficial for saving cost for storage. But it lose the opportunity for anaylize and improvement for future and future support for new user's requirements.
Archiving expired paste increase the cost for storage. But we would choose this for future.
MySQL vs MongoDB vs S3
if we choose deleting expired paste, MySQL is enough for this service. But We will choose hybrid. for not expired paste we would use MySQL and for expired paste we would use S3 after archiving
Failure scenarios/bottlenecks
we would replicate the server since it is stateless service.
DB
Multi AZ setup. and incremental backup
redis
point in recovery
cron job
Future improvements
- edit after sharing
- parallel editing with people