Design Pastebin - System Design

System requirements

Functional:

Users should be able to paste their text for a set period
Users should be given a link to their text
The content should be accessible for a set period.

Non-Functional:

The system should be available
The system should be reliable, and no text should be lost

Capacity estimation

DB:

Assuming the Size of text on average = 50kb per paste

Assuming 100,000 pastes per day * 50 kb = 5GB per day of storage

5GB *365 = 1.825TB of storage for text files in our data base.

Traffic:

Writes:

(100,000/(24*360))=1.16 pastes per second

1.16* 50 kb, roughly 60 kb per second for writes.

Reads:

Assuming 1:1000 write to read ratio:

60 kb*1000= 60 MB

Cache:

1.825TB * 0.2= 256 gb

a 20% cache size is a pragmatic rule of thumb—it’s large enough to cover the working set of frequently accessed content (maximizing hit rates) yet small enough to keep in‑memory costs under control.

API design

We will use REST API's

POST: create_paste(text) takes the text file as a parameter and returns 200 upon success and 500 if it fails. We will limit the amount of text based on the text box in the UI so as not to surpass the threshold.
1. Success Case : {status:200, message: paste link}
2. Error Case: { status: 500, message: error message}
GET: get_paste(url): takes the generated link as input and returns the text in the response message
1. Success Case : {status:200, message: pasted text}
2. Error Case: { status: 400, message: the link is invalid or expired}
3. Error Case: { status: 500, message: server error}

Database design

Since we assume the pastes are anonymous we don't need a users table. We will have one table for pastes as such:

Paste Info Table (SQL Database)

Primary ID
paste_key -> 8 value char -> index for faster query
paste_url -> 8 value char-> index for faster query
created_at ->
expires_at -> index for querying expirated rows.

Paste Table: (Key value store such as S3)

Paste_key-> same
Paste Content

MongoDB (NoSQL document database) is preferred for a fitness tracking app because it excels at handling dynamic, structured data with frequent updates and complex query needs. (Fitness app data (workout logs, user progress, and goals) is naturally represented as documents with structured fields. MongoDB stores data in a JSON-like format (BSON), which makes it easier to work with complex data structures and perform rich queries)

Amazon S3 (Object storage service) is optimized for storing and serving static, large objects and doesn't provide the rich query capabilities or flexible update mechanisms that a fitness app requires.

(

Paste content is essentially unstructured data (text or binary blobs) that fits naturally into objects stored on S3
Pastes are static content that, once written, do not change frequently. S3 is optimized to serve static files quickly, and it integrates seamlessly with CDNs for global distribution.

)

Since this is a heavy read operation we will use single leader replication(all writes go to the leader node, read replicas (followers) can be horizontally scaled to handle heavy read loads).

To address concurrency

We will repopulate the table with paste_key and paste_url.

"repopulate" means that for each new paste entry, you generate (or calculate) the paste_key and paste_url for that new record and then add that entry to the table—rather than re-generating or rewriting the entire table with every write

This will be created using MD5 hashing. (Rather than waiting for the user to finish writing the paste and then hashing the complete text, the system precomputes a pool of MD5-generated hash pairs. This means the MD5 algorithm is applied to some predetermined input (like a sequence number, counter, or a random seed) ahead of time. These precomputed values are then stored and ready to be assigned as new pastes are created.)

We can take the first 8 values of the hash as paste_key and the second 8 as paste_url. We will not be creating the hash values upon write but pre-populating them.

In concurrent writes, each write can grab a lock for each index, ensuring its thread is safe.

To address deadlocks the locks will only lock the primary index which also serves as the paste_key and then move to paste_url so there is a lock ordering. We can also add timeouts

Lock Acquisition on Indexes:

Primary Lock First (paste_key):
Every write operation first acquires a lock on the primary index (paste_key). This ensures that each paste gets a unique primary key in a thread-safe manner, preventing conflicts on the primary identifier.
Secondary Lock (paste_url) Next:
Once the lock on the primary index is acquired, the write operation then proceeds to acquire the lock on the secondary index (paste_url).

Enforcing Lock Ordering to Prevent Deadlocks:

Deadlocks often occur when different transactions lock shared resources in different orders (e.g., Transaction A locks paste_url first, while Transaction B locks paste_key first). To avoid this, the system enforces a strict ordering: always lock the primary index (paste_key) first and only then lock the secondary index (paste_url).
By ensuring all writes follow the same lock acquisition order, the possibility of cyclic waits (where each transaction waits for the other to release a lock) is greatly reduced or eliminated.

Timeouts to Further Prevent Blocking:

In high-concurrency situations, if a transaction is unable to acquire the necessary locks (because another transaction holds them), the system can apply a timeout. This means that if a write operation cannot get the lock within a specified period, it will be aborted or retried.
The timeout mechanism prevents any single write operation from waiting indefinitely for a lock, which helps keep the system responsive.

If you choose to use just the paste_url as the key—eliminating the separate paste_key—you simplify the schema, but you may lose some benefits:

Separation of Concerns:
- Internal vs. Public Identifier:
- Having separate identifiers (paste_key internally and paste_url externally) decouples the internal data management from what users see. The internal paste_key can change behind the scenes without affecting public URLs.
- Flexibility:
- With two keys, you can change how you generate or update one without impacting the other. Using a single key ties your internal representation directly to the public value.
Concurrency and Locking:
- In the two-key design, you can enforce a lock ordering (first on paste_key, then on paste_url) during concurrent writes, which helps avoid deadlocks.
- With only one key, all operations would lock on the same field, which might increase contention and reduce concurrency because every write operation is serializing on a single lock.
Performance:
- Having separate indices for internal operations and public lookups allows you to optimize each independently. With just paste_url, you have to ensure that this one index is optimized for both fast writes and fast lookups by users.

To address expired rows we have created an index on the created_at field and can run a job every given period to detect old rows.

We will use file storage such as Amazon S3 for the paste themselves. The paste_key will serve as the key to the content of the paste. We are confident that we can horizontally scale a S3 service if needed.

To address latency between the leader node and follower nodes, we can use a quorum strategy to address the delay between followers not being updated.

A quorum strategy can help mitigate the effects of this latency on read and write operations in the following ways:

Write Operations:
- When a write is issued, the system doesn’t wait for every single node to update before confirming the write.
- Instead, the write is considered successful after receiving acknowledgments from a quorum (a majority) of nodes.
- This means you only need to wait for a subset of nodes (for example, 3 out of 5) rather than all nodes, thereby reducing the write latency impact caused by slow followers.
Read Operations:
- For reads, instead of fetching data from a single node (which might be lagging), the system queries a quorum of nodes.
- The read operation aggregates responses from enough nodes so that at least one of them has the most recent update.
- By ensuring that the read quorum overlaps with the write quorum (i.e., they share at least one node that was updated by the last write), the system provides a more consistent (fresh) view of the data, even if some nodes are delayed.
Balancing Consistency with Latency:
- The quorum strategy helps balance the trade-off between low latency and strong consistency. You accept that not all nodes are instantly updated, but by requiring a quorum on reads, the system ensures that the read result will include the latest committed change from a majority of nodes.
- This way, even if some followers are delayed, the system waits for enough nodes to respond—thus reducing the chance of returning outdated information.

Let's assume you have 5 nodes labeled 1, 2, 3, 4, and 5. A common quorum configuration is to set both the write quorum (W) and read quorum (R) to 3, ensuring that:

R+W>5(3 + 3 > 5)

This guarantees that the read and write quorums will have at least one node in common.

Example Configuration

Write Quorum:

Suppose you require that nodes 1, 2, and 3 must acknowledge a write before it’s considered successful.
That is, when writing, the system will wait for confirmation from nodes 1, 2, and 3.

Read Quorum:

For reading, you could query nodes 3, 4, and 5.
This means that when performing a read, the system gathers responses from nodes 3, 4, and 5 and uses the overlapping data (node 3 in this case) to ensure that it reflects the latest committed write.

Why This Works

The overlap between the write and read quorums (node 3, in this case) ensures that at least one of the nodes being read has the most recent write.
This helps mitigate replication latency because even if some nodes are delayed, the common node in the overlapping quorum will reflect the latest update.

Note:

if one node (for example, node 3) is slow, overloaded, or temporarily unavailable, the system might choose another set of nodes (like 2, 4, 5 )

What matters is that any read quorum you form must intersect with the write quorum such that at least one node has the most recent update

High-level design

Scalability and fault general tolerance:

1. To prevent abuse of our system we can impose a rate limiter based on ip addresses and a bucketing strategy, we can have a higher number for read requests and lower number for writes. If users exceed this amount we can display a message letting them know. The bucketing strategy will prevent distributed types of attacks

A bucketing strategy is a common method used for rate limiting that groups or "buckets" incoming requests—typically by a source identifier such as an IP address—and controls the rate at which those requests are allowed.

How It Works

Token Bucket / Leaky Bucket Concepts:

Token Bucket:
- Each client (or IP address) is assigned a "bucket" that holds a certain number of tokens. Tokens are added at a fixed rate, and each incoming request consumes one token. If there are no tokens left, the request is either delayed or rejected until more tokens are added. This method allows a controlled burst of requests and then limits the sustained rate.
Leaky Bucket:
- In this approach, requests are enqueued in a bucket (or queue) and processed at a fixed rate. If the queue is full (i.e., the bucket reaches its capacity), incoming requests get dropped or rejected.

Grouping Requests:
- The strategy groups requests based on certain criteria (for example, IP address). For each group, you define an allowable quota (bucket size) and a refill rate. If a user exceeds their quota within a given time window, further requests are limited.
Preventing Distributed Attacks:
- By applying a bucketing strategy on a per-IP or per-API-key basis, you can limit the number of requests from a single source. This helps prevent abuse or distributed denial-of-service (DDoS) attacks by ensuring that no single source can overwhelm your system.

In the Context of Your System Design

Read vs. Write Limits:
You may set a higher bucket size or refill rate for read requests, since these are more common, and a lower bucket size for write requests to prevent abuse.
Enforcement:
When a client sends a request, the system checks the corresponding bucket.
- If enough tokens exist in the bucket, the request is processed, and a token is removed.
- If the bucket is empty, the system responds with an error or a message indicating that the rate limit has been exceeded.

2. All read/write requests go through a load balancer that distributes the load based on a server with minimum amount of load. We can use a weighted approach to more effectively address the balancing. We will have multiple instances of our micro services to handle increasing load. The load balancer will distribute the load first via health checks and second by the load each server is having

3. For data base, we will use single leader replication and follower nodes will act as replicas.

4. The S3 bucket will automatically scale but we can have a backup of s3 with features like Amazon AWS back up as a service

5. To help with our read/write throughput we can implement consistent hashing to shard the database. Since our paste_key is a hashed value we can assume a somewhat even distribution amongst the shards and we won't have to worry about hot spots

6. To address expiration of old pastes we will run a job once a day to delete old records. Since expires_at is a key in our db this will be relatively easy.

7. Since the hashes are prepopulated there will be no cases of collisions during the write. To genrate hashes we can use a md5 hash since its cheaper and easier to generate.

Caching:

We can assume a 20% rule where 20% of the writes are generating most of the reads. 365gb can be allocated to cache size. If there is a cache miss the cache will be updated and will reallocate the content of the cache if there is a need. To purge the cache it makes sense to use a LRU approach. Redis can be a good choice here. That being said the number can be increased or decreased based on the access paterns and we can opt for a dynamic solution here.

Writes:

After being routed to a server via load balancer a thread can grab a lock on primary_id . To Ensure minimum dead locks we will utilize a time out strategy.
Writes will be written to the leader node, and the content of the paste will be copied to the S3 bucket. With the key as the paste_key. When both writes are succesful we can return a response to the user, with the paste url. To prevent discrepencies we can utilize a write ahead log to update our s3. The write ahead log will also help with too many loads

Reads:

After being routed to a server via load balancer we will first see the cache, we will implement a write around cache, since most of the writes are unlikely to be in the cache having a write around cache would make sense
If its a cache hit we return the response (paste url)
If its a miss we will look into follower nodes which will help with read throughput.
To minimize the latency between leader and followers we will use a qourom strategy.
If there is no content we will return a 400.

Request flows

same as high level design

Detailed component design

same as high level design

Trade offs/Tech choices

SQL Although we have many writes and this could make a nosql db better, we don't want a multi leader type database. Which can disqualify mongodb and cassandra. Although infrequent choosing multi leader databases could yield inaccurate pastes and lead costumers to not trust the product.
S3 ideal for putting in paste content, provides additional benefits such as durability and auto scaling.
Populating the database with hash values for paste_url and paste_key allows the locks to process writes faster and reduce dead lock scenarios or potential race conditions. Furthermore it will reduce risk of hash collisions but can limit the users who want a customized url.

Failure scenarios/bottlenecks

Running out of hashes for our paste url
Data base goes down
Database gets large
Too many requests

Future improvements

We can use a larger hashing function to create more rows
Utilize a write ahead log to restore
We can horizantally scale with each node serving a range of hash values, we are also running jobs to clean our db after expiration.
If they are reads we can add more servers, if they are writes we need to scale our db and add more servers