Requirements
Functional Requirements:
- A user can create a paste and receives a unique url to access the content.
- A user can set visibility of the paste to public, private, or unlisted.
- Other users with access can retrieve post content with the unique url.
- Users can set paste expiry and the content will be deleted after the expiration time has elapsed.
Non-Functional Requirements:
- The system has 99.9% availability.
- The system has low latency for paste creation and retrieval.
- Paste content is consistent and after pastes have been edited, the updated past content is readily available to users.
- System is resilient to unintended usage (spam and bot usage).
API Design
POST /paste
{
content: <content>,
syntax: <syntax>,
expiry: <expiry>,
visibility: <visibility>
}
GET /paste/{paste_id}
{
content: <content>,
syntax: <syntax>,
expiry: <expiry>
}
High-Level Design
Components
- PasteRenderer - Responsible for rendering paste content to the user.
- PasteRenderer fetches paste content for a unique url from the PasteStore. PasteRenderer is responsible for syntax formatting/highlighting of paste and returning a response to the API. The client can use response from the API with minimal translation.
- PasteRenderer can be load-balanced/scaled to handle large number of requests. Frequent GET /paste/{paste_id} requests can be cached such that any transformations do not have to be repeatedly calculated. Cache can use a LRU cache eviction policy
- PasteStore - Responsible for creating/storing/retrieving pastes to a database.
- Pastes can have multiple storage options based on paste type.
- In-Memory database like Redis/Memcachedb for temporary pastes.
- PostGreSQL/DynamoDB for main paste use-case.
- S3 for larger paste objects and attachments.
- Content can be cached for faster retrieval with a LRU eviction policy. Cache is distinct from PasteRenderer as same post content may be requested but undergo several transformations.
- Pastes can have multiple storage options based on paste type.
- TTLExpiry - Responsible for deleting expired pastes.
- Background jobs that run on PasteStore data storage in order to remove paste content that has passed the expiration date.
- Background jobs can be run hourly or daily depending on needs of the system. Alternatively, background jobs can be scheduled to run ad-hoc for sensitive paste content.
Detailed Component Design
- Abuse Prevention
- Rate Limit based on IP or user id (10 pastes/min, 100 pastes/day)
- CAPTCHA process during paste creation
- Limit on paste size (1MB for text content, 5MB for images)
- Obfuscation of personal information and data.
- PASTE ID generation
- Last x digits of UUID -> Base-64 encoded.
- PasteIDs can be pre-generated and assigned when POST /paste requests are made.
- Paste ids are marked as deprecated for a period of time after paste content deletion. Paste ids may be recycled for re-use after time has passed.