Design Pastebin - System Design

System requirements

Displaying the text using the link must be fast (< 250ms)
Strong consistency is not required (the link can be available to all users after some delay for instance)
The system must be highly available but not as much as a financial service would be (99.9% uptime is acceptable)

v1/paste

inputs: text

returns: link

v1/view

input: unique ID

returns: text

PastedText table

The main entry point is the API servers
Writing is made asynchronously using a queue system. This system is in charge to write the new text to the database and it replicates it to the CDN
Purging old pasted text is handled by a dedicated service.

One key point is the generation of a unique ID for the link sharing.
- We should try to avoid a generating an ID that was already used in the past even if the associated text has been purged.
- One possible solution to generate the unique ID: unique_id = hash(content + timestamp + serverid)
- Another solution would be to generate a random string (using a different seed for each server). In case of collision we could simply generate another random string.

The purge service is a service that runs a scheduled task at regular intervals outside periods of peak activity. It uses the creationTimestamp of the pasted text to know if it must be deleted.

A lot of data is written each day so we need a high troughput

Strong consistency is not required in this context

A NoSQL database should be used so we can scale better

There can be peaks of activity when users tries to write a lot of data:

If any database node is not available, the text is not lost thanks to the queue

The api server could write the text synchronously as a fallback if the queue system isn't available

We can improve this design with analytics and logs

Some text could be pasted severial times so we could add a way to store them just once