Design Pastebin - System Design

Requirements

Allow users to upload and store text or code snippets.
Generate a unique shareable URL for each paste.
URLs need to be non guessable by being a randomly generated UUID.
Enable retrieval of paste content by URL.
Support expiration and TTL for pastes.
Allow paste owners or the system to delete a paste before its natural expiration.

Needs to have low latency (10 ms to 100 ms for write and 10 ms for reads by using caching and CDN)
Needs to be consistent (We must always be able to retrieve the exact contents the user wrote)
Needs to be scalable (Needs to handle 100,000 reads and 10,000 writes per second during peak usage)
Needs to have high availability (Needs to have an uptime of 99.9% or higher)

Assume on average, we get about 50,000 reads per second and 5000 writes per second.
Assume the average paste size is 10KBs.
50,000 reads per second * 10KBs = 500,000 KBs (0.5 GB) * 60 seconds * 60 minutes * 24 hours = 43,200 GB per day * 365 days = 15.7 million GB per year
5000 writes per second * 10KBs = 50,000 KBs (0.05 GB) * 60 seconds * 60 minutes * 24 hours = 4320 GB per day * 365 days = 1.57 million GB per year

savePaste:
1. This endpoint accepts a string and an optional TTL and return a randomly generated UUID as the id for this paste. The contents will be written to a write-through cache and to the DDB. The user will also be limited to saving 5 posts a minute.
2. Request: { contents: string, ttl?: integer }
3. Response: { id: string }
getPaste:
1. This endpoint takes a UUID string and attempts to retrieve the corresponding paste from the cache. If there is a cache miss, we will retrieve it from the database.
2. Request: { id: string }
3. Response: { contents: string }
deletePaste:
1. This endpoint will take in a UUID string and will attempt to delete the corresponding paste from the cache (if it exists) and in the database.
2. Request: { id: string }
3. Response: None

We will first have the Client connect to the CDN to retrieve static content to boost performance.
When the Client makes a request, they will first reach the rate limiter to limit the number of pastes that they can create within a minute (5 pastes per minute). If they exceed this limit, they will receive an error.
After the rate limiting, we will reach the load balancer to equally distribute the load between servers.
At the server, we reach two different services.
1. Generation Service: This service will randomly generate a UUID for the paste on creation.
2. Paste Service: This service will handle the creation of the paste into an S3 bucket, updating the paste, getting the paste, and deleting the paste.
Both of these services will first go through the Write Through Cache. The cache will store the data for recently accessed pastes to improve performance. We can prevent the cache from being a single point of failure by having multiple cache instances and replicating the data across those instances.
When we update the cache, we will also update the database. The database will store the metadata of the paste such as the creation date or TTL if the user chose to add it. Once the TTL is reached, it will trigger the Expiration Service to clean up that paste from the cache, database, and S3 bucket.

Generation Service
1. This service will handle the generation of the UUID for the creation of the paste. We can ensure that there is never a collision (having 2 of the same UUIDs) by first checking in the database if that UUID already exists. We can ensure randomness in the UUID by hashing the userID and the timestamp to generate that ID. Furthermore, we can also implement idempotency by including an idempotency key in the request to check if we have already made the request.
Paste Service
1. This service will handle the creation and updating of the paste by storing the contents of the paste into the S3 bucket and assigning the UUID store in the database to that S3 item.
2. When the user attempts to retrieve the paste, we first fetch the metadata from the cache or database which will contain the path to the contents in the S3 bucket. We can then return those contents to the user.
3. For the cache, we will utilize Redis. We can use an LRU cache to make sure that we always prioritize popular pastes first. We can have cache stampede protection by locking other requests to that item if we get a cache miss. After we fetch the data and store it in the cache again, we can unlock those requests to fetch from the cache so we don't have spikes to read from the database.
Expiration Service
1. This service will handle deleting the contents of the paste from the cache, database, and S3 bucket when the TTL is reached. The TTL can trigger a Lambda to handle those deletions.

Tech choices: DynamoDB, Redis, S3, AWS Lambda
We choose to have a CDN to favor performance over costs and complexity. Due to the number of users and requests required, performance is favored.
We choose to have a non-relational database like DynamoDB over a relational database because DynamoDB allows us to handle high volumes of read and write. Furthermore, DynamoDB allows us to scale horizontally to handle an increase of users and requests during peak season.