Requirements
Functional Requirements:
- A user can create a paste, set visibility (public/private/unlisted), set expiration time and receives a unique url to access the paste.
- Users with permissions (for private pastes) can retrieve post content with the unique url.
Non-Functional Requirements:
- The system has 99.9% availability.
- The system has low latency for paste creation and retrieval.
- Paste content is consistent and after pastes have been edited, the updated past content is readily available to users.
- System is resilient to unintended usage (spam and bot usage).
API Design
POST /paste
{
content: <content>,
syntax: <syntax>,
expiry: <expiry>,
visibility: <visibility>
}
GET /paste/{paste_id}
{
content: <content>,
syntax: <syntax>,
expiry: <expiry>
}
High-Level Design
Components
- PasteRenderer - Responsible for rendering paste content to the user.
- PasteRenderer fetches paste content for a unique url from the PasteStore. PasteRenderer is responsible for syntax formatting/highlighting of paste and returning a response to the API. The client can use response from the API with minimal translation.
- PasteRenderer can be load-balanced/scaled to handle large number of requests. Frequent GET /paste/{paste_id} requests can be cached such that any transformations do not have to be repeatedly calculated. Cache can use a LRU cache eviction policy
- PasteStore - Responsible for creating/storing/retrieving pastes to a database.
- Pastes can have multiple storage options based on paste type.
- In-Memory Redis database for temporary pastes.
- DynamoDB for main paste use-case.
- S3 for larger paste objects and attachments.
- Content can be cached for faster retrieval with a LRU eviction policy. Cache is distinct from PasteRenderer as same post content may be requested but undergo several transformations.
- Pastes can have multiple storage options based on paste type.
- TTLExpiry - Responsible for deleting expired pastes.
- Background jobs that run on PasteStore data storage in order to remove paste content that has passed the expiration date.
- Background jobs can be run hourly or daily depending on needs of the system. Alternatively, background jobs can be scheduled to run ad-hoc for sensitive paste content.
Detailed Component Design
- Abuse Prevention
- Rate Limit based on IP or user id (10 pastes/min, 100 pastes/day)
- Rate Limiting based on region to target known spam regions or bad actors
- CAPTCHA process during paste creation
- Limit on paste size (1MB for text content, 5MB for images)
- Obfuscation of personal information and data.
- Machine learning algorithms to identify suspicious behavior.
- User reputation system or account-specific paste limits to allow higher paste limits for good user behavior.
- PASTE ID generation
- SHA-256 hash function to generate unique ids.
- PasteIDs can be pre-generated and assigned when POST /paste requests are made.
- Expired paste ids are added to a data store to track recently expired ids. Paste ids may be recycled for re-use after enough time has passed.
- Paste Access
- Pastes have visibility public, private, or unlisted with the following properties:
- public pastes can be seen by all users (logged in or anonymous users)
- private pastes can be seen by the paste creator or users added to the access list
- Access lists can be role-based such as a list of users that can edit the paste and a list of users that can view the paste
- unlisted pastes are pastes created by anonymous users. By default, these are created as private but can also be created as a public paste.
- When fetching a paste from the paste store, the request goes through the user authentication service. The service can audit and log access requests for paste content, giving the creator and authorized users insight into usage and access patterns.
- Whether a user has access to a given paste can be saved/cached separately from fetching paste content.
- Pastes have visibility public, private, or unlisted with the following properties:
- TTL Expiry
- When a paste is created with expiry or the expiration date is modified, the paste id can be added to a scheduled list. When the background job runs, it checks the scheduled list for the given day/hour and adds those paste ids to the queue. When processing a given paste, it will verify that expiration date has passed before deletion.
- Deleted pastes might have their unique url deprecated for up to a year. If a user tries to access a deprecated unique url, they can be directed to a page notifying them the url has expired. If the user has access to a new paste that is created with a recycled url, they may be able to see the content.
- Asynchronous job to clean-up pastes at fixed intervals.
- Scheduling system to check for expiry based on a priority queue.
- Users are notified prior to paste expiration with an option to extend the paste.
- Performance Monitoring and Analytics
- Real-time dashboards with user access, error rates, and paste creations.
- A/B testing for new features.
- Cache Layer Optimization
- Multi-tiered caching strategy including local cache per server and a shared, distributed Redis cache.
- Pre-Warm cache based on historical data usage and predictable use-cases (upon cache creation a paste is likely to be accessed).