System requirements


Functional:

  • Users can store texts
  • Texts can be shared via a unique link
  • Texts have a defined lifetime


Non-Functional:

  • Displaying the text using the link must be fast (< 250ms)
  • Strong consistency is not required (the link can be available to all users after some delay for instance)
  • The system must be highly available but not as much as a financial service would be (99.9% uptime is acceptable)


Capacity estimation

  • 1M pasted text per day
  • 10KB per text
  • ~10GB of new data each day
  • ~4TB per year


API design

v1/paste

inputs: text

returns: link


v1/view

input: unique ID

returns: text


Database design

PastedText

  • ID INT (index)
  • UniqueID TEXT (index)
  • Content TEXT
  • Timestamp DATETIME



High-level design

The main entry point is the API servers

Writting is made asynchronously using a queue system

Purging old pasted text is handled by a dedicated service



Request flows

  • User call api to paste text
  • Application server generate a new unique ID for the link sharing
  • Text is then serialized to the database
  • Users wants to view a text
  • Application server first check if it is in the cache
  • If not it fetches it from the database


Detailed component design



Trade offs/Tech choices

Lot of data is written each day so we need a high troughput

Strong consistency is not required in this context

A NoSQL database can be used so we can scale better



Failure scenarios/bottlenecks

There can be peak activity where users tries to write a lot of data

We can handle the writting asynchronously to prevent this bottleneck (using a queue system like Kafka or another technology)

If any database node is not available, the text is not lost thanks to the queue

The api server could write the text synchronously as a fallback if the queue system isn't available


Future improvements

We can improve this design with analytics and logs