Requirements


Functional Requirements:

  • A user can create a paste and receives a unique url to access the content.
  • A user can set visibility of the paste to public, private, or unlisted.
  • Other users with access can retrieve post content with the unique url.
  • Users can set paste expiry and the content will be deleted after the expiration time has elapsed.



Non-Functional Requirements:

  • The system has 99.9% availability.
  • The system has low latency for paste creation and retrieval.
  • Paste content is consistent and after pastes have been edited, the updated past content is readily available to users.
  • System is resilient to unintended usage (spam and bot usage).



API Design


POST /paste

{ content: <content>, syntax: <syntax>, expiry: <expiry>, visibility: <visibility> }

GET /paste/{paste_id}

{ content: <content>, syntax: <syntax>, expiry: <expiry> }


High-Level Design

Components

  • PasteRenderer - Responsible for rendering paste content to the user.
    • PasteRenderer fetches paste content for a unique url from the PasteStore. PasteRenderer is responsible for syntax formatting/highlighting of paste and returning a response to the API. The client can use response from the API with minimal translation.
    • PasteRenderer can be load-balanced/scaled to handle large number of requests. Frequent GET /paste/{paste_id} requests can be cached such that any transformations do not have to be repeatedly calculated. Cache can use a LRU cache eviction policy
  • PasteStore - Responsible for creating/storing/retrieving pastes to a database.
    • Pastes can have multiple storage options based on paste type.
      • In-Memory database like Redis/Memcachedb for temporary pastes.
      • PostGreSQL/DynamoDB for main paste use-case.
      • S3 for larger paste objects and attachments.
    • Content can be cached for faster retrieval with a LRU eviction policy. Cache is distinct from PasteRenderer as same post content may be requested but undergo several transformations.
  • TTLExpiry - Responsible for deleting expired pastes.
    • Background jobs that run on PasteStore data storage in order to remove paste content that has passed the expiration date.
    • Background jobs can be run hourly or daily depending on needs of the system. Alternatively, background jobs can be scheduled to run ad-hoc for sensitive paste content.



Detailed Component Design

  • Abuse Prevention
    • Rate Limit based on IP or user id (10 pastes/min, 100 pastes/day)
    • CAPTCHA process during paste creation
    • Limit on paste size (1MB for text content, 5MB for images)
    • Obfuscation of personal information and data.
  • PASTE ID generation
    • Last x digits of UUID -> Base-64 encoded.
    • PasteIDs can be pre-generated and assigned when POST /paste requests are made.
    • Paste ids are marked as deprecated for a period of time after paste content deletion. Paste ids may be recycled for re-use after time has passed.
  • Paste Access
    • Pastes have visibility public, private, or unlisted with the following properties:
      • public pastes can be seen by all users (logged in or anonymous users)
      • private pastes can be seen by the paste creator or users added to the access list
      • unlisted pastes are pastes created by anonymous users. By default, these are created as private but can also be created as a public paste.
    • When fetching a paste from the paste store, user access is checked for private pastes.
    • Whether a user has access to a given paste can be saved/cached separately from fetching paste content.
  • TTL Expiry
    • When a paste is created with expiry or the expiration date is modified, the paste id can be added to a scheduled list. When the background job runs, it checks the scheduled list for the given day/hour and adds those paste ids to the queue. When processing a given paste, it will verify that expiration date has passed before deletion.
    • Deleted pastes might have their unique url deprecated for up to a year. If a user tries to access a deprecated unique url, they can be directed to a page notifying them the url has expired. If the user has access to a new paste that is created with a recycled url, they may be able to see the content.