System requirements
Functional:
- Users can store texts
- Texts can be shared via a unique link
- Texts have a defined lifetime
Non-Functional:
- Displaying the text using the link must be fast (< 250ms)
- Strong consistency is not required (the link can be available to all users after some delay for instance)
- The system must be highly available but not as much as a financial service would be (99.9% uptime is acceptable)
Capacity estimation
- 1M pasted text per day
- 10KB per text
- ~10GB of new data each day
- ~4TB per year
API design
v1/paste
inputs: text
returns: link
v1/view
input: unique ID
returns: text
Database design
PastedText
- ID INT (index)
- UniqueID TEXT (index)
- Content TEXT
- Timestamp DATETIME
High-level design
The main entry point is the API servers
Writting is made asynchronously using a queue system
Purging old pasted text is handled by a dedicated service
Request flows
- User call api to paste text
- Application server generate a new unique ID for the link sharing
- Text is then serialized to the database
- Users wants to view a text
- Application server first check if it is in the cache
- If not it fetches it from the database
Detailed component design
One key point is the generation of a unique ID for the link sharing
We should try to avoid a generating an ID that was already used in the past even if the associated text has been purged
One possible solution using a hash function on several concatenated data:
unique_id = hash(content + timestamp + serverid)
Trade offs/Tech choices
Lot of data is written each day so we need a high troughput
Strong consistency is not required in this context
A NoSQL database can be used so we can scale better
Failure scenarios/bottlenecks
There can be peak activity where users tries to write a lot of data
We can handle the writting asynchronously to prevent this bottleneck (using a queue system like Kafka or another technology)
If any database node is not available, the text is not lost thanks to the queue
The api server could write the text synchronously as a fallback if the queue system isn't available
Future improvements
We can improve this design with analytics and logs
Some text could be pasted severial times so we could add a way to store them just once