System requirements
Functional:
- Upload a text file and receive a link
- Store the text file at the link for 30 days
Non-Functional:
- Scale to millions of users / texts per day
- Consistency across horizontally scaled nodes
- Durability / Fail over preventing data loss
- Links are unique
Capacity estimation
**assuming 1 million DAU, 1:1 W/R ration, Peak QPS = 2 * QPS **
2 million pastes / day = 2*10^9 / 10^5 = 2*10^4
= 20,000 QPS (writes)
2 mill * 30 days = 60 million / month
**assuming 1KB file limit**
60 * 10^9 * 1KB = 60 * 10^12 = 60TB storage requirement / month
**assuming average of 5 paste reads / day**
5 million reads / day = 5 * 10^9 / 10^5 = 50,000 QPS
API design
POST /api/v1/uploadText
content-type: application/json
args: body
ret: UUID
GET /api/v1/getText?uuid=UUID
content-type: plain/text
args: UUID
ret: uploaded text (browser will render into the page)
Database design
Database is a key value store where the key is unique to the paste and the value is the text stored at that link.
60TB storage requirement isn't out of the realm of possibility for a single machine, so single leader replication could be a good fit. 20,000QPS = 1KB * 2 * 10^4 = 20MB is almost too high for one write node but is theoretically possible on high-end servers. If we needed to scale we could use a weakly consistent sharding scheme, partitioned on user_id or even grographical location if user_id doesn't exist. we can use read replicas or hash indexes to speed up reads. LSM or B-tree based indexes would be a waste as we don't need to do range queries.
That being said let's choose a document based data base, the data won't benefit from relations. Also noSQL are easier to set up sharding when needed. Let's pick cassandra for it's feature rich offerings including a consistent hashing type algorithm to distribute load evenly among commodity hardware (through the use of virtual nodes and other methods) this provides support for heterogeneity which reduces costs and mitigates the hot shard / celebrity problem.
High-level design
Write:
We need to allow the customer to upload their content. Besides uploading the file using HTTP, we also need a unique identifier to serve as the key. There are multiple ways of going about this: we can use a hash function + linear probing or buckets for collisions, unsuring uniqueness of the keys. We can also use the snowflake UUID method, where we combine information about the request to form a composite key of sorts which together uniquely identify any paste. This often includes an offset to a predetermined start time (since we are only storing for 30 days this can be T-30 days), as well as either the user_id or perhaps a checksum / hash of the content. If we use the hash of the content we can stop users from duplicating texts which would be good.
Read:
Mostly just checks if the content is caches, returns it if so, checks if the UUID is present in the DB, returns the value or an error.
Request flows
Write:
- User's browser makes a POST request to our upload endpoint with the body of the request containing the text.
- The load balancer will direct the request to a stateless web server, which can be easily scaled and recovered from failures since there's no state. It can also serve requests for any shard since we don't need any information about the customer from our database yet.
- Proxy asks for a UUID, which will fail if the UUID already exists due to the customer duplicating a paste, and the upload service will return a 4xx to the customer, or perhaps a redirect to the page that already exists, which would be more graceful and is the better choice.
- The proxy / web server that receives the request will determine which shard the user / their geographical area should be on. One downside to using geographical area is that there's a chance we need cross-shard reads if someone moves outside their area which would increase latency.
- The proxy makes a call to the appropriate DB node and that node ACKs back to the upload service when the node was successfully written (we should retry a few times if not or use a queue that doesn't pop a value until it is read or exhausts the retry mechanism). When the status of the write is determined the proxy could respond with the text, but it may save on network to just return the UUID, and have the customer redirected to the "read page" which just makes a get request with the UUID to return the text which it renders. This way we only send the text over the network if the client actually wants to access it after writing, beyond just a successful confirmation.
Read:
- Client GETs paste.com/UUID
- Load balancer directs requests to the appropriate read service node, which now that I think of it can be combined with the upload service web tier. The combined stateless web tier can determine what calls to make for write / read just fine as a single horizontally scaled unit.
- The web tier will determine this is a read request, and will check the cache then the database if the cache misses.
- If the UUID key is found in the databse (not expired), we return the body of text associated with it, which the browswer will render into a page for the customer to view.
Detailed component design
Above
Trade offs/Tech choices
Above
Failure scenarios/bottlenecks
Above
Future improvements
Above