System requirements
Functional:
Users must be able to store text online for a set period of time.
Other users must be able to access this data through a shared link.
Non-Functional:
Data most not be lost before the set period of time.
Latency should not be bigger than 500ms.
Capacity estimation
The system we will design will be for 100k DAU.
We will average 10 requests per day per user, of which 80% are read requests.
Averaging ~10rps.
8 read requests and 2 write requests.
API design
create_paste_bin
input: text, expiration_time
output: paste_bin_id
read_paste_bin
input: paste_bin_id
output: text
Database design
paste_bin
text
expiration_time: timestamp
High-level design
Whenever a client creates a new page bin, this is sent to an API gateway that sends it to a load balancer that sends it to horizontally scaled servers.
These servers writes the pastebin to a distributed cloud object storage, with replication and to a cache with its TTL.
The distributed cloud object storage should support deletion on expiration per object, as most cloud providers do.
The caches have an eviction policy of LFRU, sending least frequent reads to the server which takes them for the object storage.
Request flows
Write:
A client creates a new paste bin which is stored in the database
A client reads from a paste bin in which the cache or the object storage serve the object.
Detailed component design
The cache uses an eviction policy LFRU or least frequently recently used.
The object storage has both data replication across multiple zones. It handles the lifecycle and deletion of expired pasteBins.
The server's filters already expired objects, not retrieving them for the users and returning a "not found"—whether it's not found or expired.
In order to generate the random keys for the paste beans, we use a particular hash from eight digits which also removes certain characters that are complicated—but has a lot of possible different variations for a single character as much as 64.
Trade offs/Tech choices
We are choosing an object storage over any kind of database due to its lifecycle management capabilities, reliability, replication, and cross-zone data persistence. its latency is also minor, and it's better suited for large pieces of text—as the ones usually shared through a Pastebin—than other databases we sacrifice the capability of querying as we're only retrieving the text files by their IDs.
Failure scenarios/bottlenecks
If an upload fails the client will retry. The object storage provides strong consistency to not provide expired objects that should be deleted
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?