Requirements
Functional Requirements:
- Allow users to upload and store text or code snippets.
- Generate a unique shareable URL for each paste.
- Enable retrieval of paste content by URL.
- Support expiration and TTL for pastes.
- Allow paste owners or the system to delete a paste before its natural expiration.
Non-Functional Requirements:
- Once text has been pasted, the upload of text should be done within 500ms. (write operation)
- The retrieval of paste content by URL should be done within 500ms. (read operation)
- The service should support up to 1 billion MAU. Peak QPS will be around 1500.
- The service should be available at least 99.999% of time.
- The service could store up to 10K words per user. Assuming 8 byte per word, that would be 80TB of storage needed.
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
There will be 4 APIs we need to support:
Upload of pasted text:
User_id is a unique identifier of a user, used as a path parameter.
POST v1/upload_text/{user_id}
{
post_id: UUID,
text: String,
createdAt: Timestamp,
textType: Enum, (code or text),
ttl: String
}
Generation of unique sharable URLs:
POST v1/generate_url/{user_id}
{
post_id: UUID
}
Retrieval of text blob using the URL:
GET v1/retrieve_text/{user_id}
{
url: String
}
Deletion of a post:
DELETE v1/delete_post/{user_id}
{
post_id: UUID
}
High-Level Design
Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.
First, we need to define the data model for storing the pasted text:
table pasted_text_table {
post_id: UUID, [partition key]
pasted_text: jsonb,
user_id: UUID, [clustering column]
created_at: Timestamp,
ttl: string,
text_type: string (whether the stored text is a text or code snippet)
}
table pasted_text_url {
post_id: UUID,[partition key]
user_id: UUID, [clustering column]
sharable_url: String
}
For data storage, we will use cassandra, which is a good use case since we need heavy write throughput and high availability.
For all the requests, we first go through a load balancer, which routes traffic to different servers using consistent hashing. The request then goes to an API gateway, which does authentication, and rate_limiting by user_id, IP address etc.
Now let me walk through the 4 flows:
- When users want to upload text, client triggers the text upload API. We create a unique UUID for the text, specify the text type, set the user_id of the user creating the text snippet, and set the corresponding creation timestamp and associated TTL. Cassandra supports expiration of data based on TTL natively.
- When users want to delete text, client triggers the text delete API. We use the post_id to delete the rows in the 2 cassandra tables.
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.