Requirements


Functional Requirements:


  • Allow users to upload and store text or code snippets.
  • Generate a unique shareable URL for each paste.
  • Enable retrieval of paste content by URL.
  • Support expiration and TTL for pastes.
  • Allow paste owners or the system to delete a paste before its natural expiration.



Non-Functional Requirements:


  • Scalability
  • Availability
  • Reliability
  • Fault Tolerance
  • Latency
  • Security - Authentication & Authorization


API Design

  • saveSnippet(snippet: String): {url: String, code: Number} - returns a shareable URL and status/error code
  • getSnippet(url: String): String
  • deleteSnippet(url: String): Number - returns status/error code


High-Level Design

  • Load Balancer (LB) to address Scalability, Availability, and Reliability NFRs by distributing load among horizontally scalable application servers.
  • Geo DNS to address the Fault Tolerance NFR by failing over to a different region in case of failure in one of the regions.
  • Cache server to cache frequently used snippets to address the latency NFR.
  • A relational user database to address the Authentication & Authorization NFRs by allowing users to authenticate and access only their own snippets to be able to update or delete.
  • A NoSQL snippet database to allow users to quickly retrieve snippets using snippet ID obtained from the shared URL.
  • A Snippet ID database to store pre-generated GUIDs ready to be used for creating future snippets.



Detailed Component Design

Here are my assumptions regarding storage size estimates needed for this system:

  • Size of each snippet: 1MB
  • The number of users storing snippets per second: 100
  • Daily storage expectation: 100 * 25 * 60 * 60
  • Monthly storage expectation: (100 * 25 * 60 * 60) * 30


For generating IDs for each snippet:

  • User must be authenticated. A unique user ID must be available.
  • We will use a GUID generator to generate snippet IDs.
  • Generating a GUID is a heavy operation, so we will frequently pre-generate a number of ready to use GUIDs offline.
  • A batch job will run on a daily basis to check the number of available GUIDs and create new ones once the number of available GUIDs are below the daily storage expectation threshold.
  • saveSnippet will pull a snippet GUID from the snippet ID database, create a new entry in the snippet DB storing the pulled GUID along with the snippet itself and the user ID. Once storing the new snippet is successfully done, the pulled GUID will be deleted from the snippet ID database. A URL with the GUID appended to it will be returned to the user. For example, https://mypastebin.com/{snippet-id}


For retrieving a snippet using an ID:

  • The snippet database is a NoSQL map that maps GUIDs to the actual stored snippet.
  • The user retrieves a specific snippet by appending the snippet ID to the URL. For example, https://mypastebin.com/{snippet-id}
  • The getSnippet will first check the cache and if the snippet doesn't exist there, it will retrieve it from the database and cache it before returning it to the user.
  • Caching can be implemented using a least recently used (LRU) approach to make sure that only popular snippets are always cached.


For deleting a snippet using an ID:

  • User must be authenticated. A unique user ID must be available.
  • The deleteSnippet will look for the snippet ID along with the user ID and delete the corresponding item. It will also check the cache to make sure it's not cached there or remove it if it does.