Requirements

Functional Requirements:

Allow users to upload and store text or code snippets.
Generate a unique shareable URL for each paste.
Enable retrieval of paste content by URL.
Support expiration and TTL for pastes.
Allow paste owners or the system to delete a paste before its natural expiration.

Non-Functional Requirements:

System should be globally available.
System should be highly available.
System should have extremely low latency. The system will be read heavy, since we need near instant redirect when someone clicks on our short url.
System should be horizontally scalable.

We assume that since our system is globally available, it's a medium sized system with around 100,000 daily active users. With each user creating 10 new pastes per day.

Thus, our writes per second becomes 10 writes per second.

Now lets consider a short url gets clicked around 1000 times. Thus, per second our read requests are 10,000 reads per second.

As we can see, our system is gonna be read heavy.

Now, per day 1 million new pastes are created, so in 1 year we have 360million new URLs. Over 10 years we have ~3.6 billion new pastes.

We consider each paste shall generate a short url of 6 characters. Each url is a base 62 encoded string of 6 characters, thats gives us ~56 billion combinations.This is plenty to go for our system for its 10 year lifetime.

Now, for each paste created, we have metadata:

6 bytes short url
20 bytes user ID
100 bytes storage ID
8 bytes created at timestamp
8 bytes expiry date time stamp

Thats gives us ~145 bytes, including metadata, lets round up each record to 300 bytes to include DB metadata/overhead space.

Over the period of 10 years, we shall be creating 3.6 billion such records. So being very generous, that's a total space of ~2 TB for metadata including any overhead.

For the actual paste, let us assume 10 KB per paste. In the whole life time of the app, there will be 3.6 billion such pastes, so that gives us 36TB of storage.

We shall store the metadata in a DB and the actual pastes in an object storage. For both, the storage estimation is easily manageable.

API Design

There will be 3 APIs

/create POST API

This API will take the text and return a short URL. Optionally can take expiry time, else the short URL is available for the lifetime of the app.
Shortened URL is returned with status code 201 (created)
Response body shall have short url, created at, expires at(optionally, if given by the user)
This API has an authentication header for security reasons, to stop abuse.
This API shall have a rate limiter of 5 pastes per hour to prevent abuse, and returns status code 429 (too many requests) if it exceeds the rate limit. User can increase that limit, based on premium subscriptions.

/paste/url_key GET API

This shall be a GET API, response shall have code 302(found) and return the paste.
If in case the URL is not found or expired, it shall return 404 status code.
The response body has created at, expires at(if set by the user), the actual paste.
We include Cache-control headers so we can cache responses to CDN, thus reducing incoming traffic to the app servers significantly.

/delete/url_key DELETE API

This shall be a DELETE API, response shall have code 200 and delete the paste asynchronously.
If in case the paste is not found or already expired, it shall return 404 status code.

High-Level Design

We have CDN to properly cache the pastes, that way we reduce the number of requests reaching the app servers.
Load Balancers ensure to balance the load amongst different instances of the server so that in case of traffic spikes, a single server does not get overwhelmed.
API gateway ensures authentication and rate limiting, this prevents abuse of our system.
Now, we shall implement CQRS and separate read and write concerns. Since we know that our system shall be read heavy, we keep a write service for creations and deletions of pastes. We shall keep a read service whose whole and sole job shall be retrieving pastes.
With our reads being heavy, we shall implement a redis cache with the read service which caches top 20% recently accessed pastes. We shall be caching the short url and the pastes both. Since going to the server, then server fetching the content from our object storage (AWS S3) will increase latency. We shall be implementing cache aside policy so we do not unnecessarily eat up space on Redis and get data only on demand.
We shall be storing our metadata on DynamoDB. Dynamo DB is highly available globally with eventual consistency less than a second globally. Thus making our app globally available. There shall be indexes on the short url and on expiration date. We shall take advantage of DynamoDB's TTL feature to delete metadata stored according the expiry time given by the user.
We shall store the pastes in an AWS S3 object store. We could store the pastes in the DB itself but that would just add to the latency since pastes are quite large.
For cleanup, we could potentially think about creating a separate service for cleanup. But a better way is to use the tools we are already using. On DB layer, we shall use DynamoDB's internal cleanup by setting TTL according to what give by user, once TTL is hit DynamoDB auto deletes the metadata. On cache layer, we shall set TTL to expiry time to 5 minutes, that way expired records delete anyways. If the expiry time is less than 5 minutes, we can get stale data, but that is an acceptable tradeoff for our system. As for CDN, we use Cache-control headers to set TTL.
For deletion functionality, if the user deletes a paste, we delete the paste from S3 and then from Dynamo DB (ordering matters so that DynamoDB doesn't point to non existent record in S3). We shall NOT manually invalidate Redis cahce and CDN cahce, instead let the TTL pass. Again, theres a situation of stale data, but that is an acceptable tradeoff.

Schema Design

In DynamoDB we shall use Global Tables. This guarantees global availability and eventual consistency (sub 1 second, acceptable for our system). Tables shall have the following columns

id
user_id
created_at
url_key
object_reference_id (reference to S3)
expires_at (if provided by user)

Detailed Component Design

Write Flow:

Whenever a user creates a paste, their request is routes to our write service. Here a unique 6 character string i.e. key is generated, this shall be our url_key. This is a unique key which shall be used to access the created paste.

The paste is then stored into AWS S3 object storage using signed url, in response we get a reference id. If the user has provided us with time to live i.e. expiration time, we shall provide that to S3 as TTL, that way we do not need to cleanup S3 manually. We then store the reference ID, url_key, create_at, user_id, incremental id, and expires_at (if provided by user) into DynamoDB.

If the user has provided us with time to live i.e. expiration time, we shall store it as TTL in DynamoDB for this created record. That way DynamoDB takes care of the deleting the record, again eliminating the need for manual cleanup service.

Once the whole process is done, we shall return the response to the user. We shall not cache the new paste i.e. we shall not use write-through cache, since this arises a danger that unused pastes flood our redis memory even though they aren't accessed. Since, we are storing complete pastes to our redis, this is very important.

Key Generation:

For key generation we could potentially think of maintaining a separate service for key generation. But this introduces race conditions, since multiple instances may get the same key, moreover keeping a single service will make it complex to horizontally scale the system in case of traffic spikes. Instead we shall use Snowflake algorithm (like twitter uses) to generate globally unique keys.

We shall generate 64 bit ID, which shall be ID = region_code + machine_id + sequencer + time_stamp. This gives us truly unique id for each instance of each region. We shall then encode this 64bit id to a Base62 encoded string. Since this is encoding and not hashing, it maintains the uniqueness of the key. This generated key shall be our url_key that we use in the write service.

Read flow:

Whenever the short url for the paste is clicked, first our CDN is hit, and the cached pasted is returned. If there is cache miss, the API gateway redirects the request to the read service.

Since our reads are high, we have separated read and write concerns how high availability and low latency. Here we do not hit the DynamoDB, instead we hit the cache and return the complete paste. In case of cache miss, then the read service requests the Dynamo DB for the for the object_reference_id and retrieves the paste from S3.

The whole response is then cached (since we are using cache aside policy) and then returned to the user. We include Cache-control headers, that way CDN caches this paste as well, hence next time this short url is hit, CDN can directly serve the paste.

CDN caching and internal Caching are the most important components of the read flow, since they reduce the DB hits and service hits by 90%.

We have put global index on the url_key, thus when eventually we do need to query the DB, we have fast data retrievals.

Deletion flow:

Whenever a user manually deletes the pastes, the request is routed to the write service again by the API gateway. The write service queries DynamoDB and gets the object_reference_id. The write service uses this object_reference_id to then delete the record from AWS S3, once done, the write service deletes the record from DynamoDB as well and returns 200 OK to the user.

We do not invalidate any caches since we have set TTL of 5 minutes, so eventually the caches will auto delete the records, thus eliminating the need for operational overhead.

We have put global index on the url_key, thus when eventually we do need to query the DB, we have fast data retrievals thus faster deletions.

We have kept all the components loosely coupled and eliminated SPOF , thus enabling easy horizontal scaling.

Tradeoffs, concerns, security:

We can get stale data momentarily from the caches since we do not manually invalidated caches upon expiry time or upon deletion. This may serve some stale data, but after TTL the system becomes eventually consistent. For a system like pastebin, this is acceptable.
We have used AWS S3 instead of storing blobs in DB directly. One could potentially store the pastes in DB itself, but that just puts operational and memory overhead on the DB.
We have used DynamoDB since it provides Global Tables and auto deletion of records after TTL. The tradeoff here is our system isn't immediately consistent globally since deletion and creation of records need time to sync globally. But since we need to make our system highly available with low latency, we can tolerate eventual consistency.
An issue the system might face is, in cases of cache misses, we query S3 and DynamoDB for data retrieval, then the app itself gives the complete pastes to the user. The issue here is the operational over heads, since this approach significantly increases our network bandwidth. Thus, if in case there is cache failure or huge amounts of cache misses, our read service can get overwhelmed. To counter this upto certain extent, we could horizontally scale the system.
Another issue is key generation. We are using Snowflake algorithm to generate unique ids and encode them into keys. The issue here is, with each instance we need to maintain a unique machine_id and sequencer for that machine. The sequencer increments for every request/ ms. This whole key generation gives us an operational over head.
An issue is, since in case of cache misses, there are a lot of calls made to services and DynamoDB. Then a big chunk of data is returned, thus high bandwidth. This is because our read service is returning the whole paste itself. In future we could instead return a signed URL from S3 (that has a small expiry) and return the signed URL itself. This arises a security concerns, since we are directly exposing an AWS S3 signed URL, but it should be fine in most cases, since the expiry time is short.
We take care of security by including authorisation and thus API key with each request. Using this, we implement rate limiting to prevent abuse. Moreover, we do not directly expose a signed AWS S3 url, instead the whole paste goes through our read service and then returned to whoever clicks the link. Thus, we built robust security and abuse prevention.