Design Pastebin - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

store text and return url to access that for set period of time.
use login and signup and session management
The content should be accessible for a set period.

Non-Functional:

List non-functional requirements for the system...

Availability
Scalability

API design

Define what APIs are expected from the system...

1. Store Data

This endpoint is used to create a new paste by sending the text content and user information.

Method: POST
Path: /pastebin/store

Request Body

Content-Type: application/json

Field	Type	Description
`text`	`string`	The text content to be stored.
`userid`	`string`	The unique identifier for the user.

Example Request:

{ "text": "This is a new paste for the service.", "userid": "user-12345" }

Response

Status Code: 201 Created
Content-Type: application/json

Field	Type	Description
`url`	`string`	The unique URL for the newly created paste.

Example Response:

{ "url": "[https://pastebin.example.com/abcdef123] (https://pastebin.example.com/abcdef123)" }

2. Fetch Data

This endpoint is used to retrieve the text content of a paste using its unique hash.

Method: GET
Path: /{hash}

Response

Status Code: 200 OK
Content-Type: application/json

Field	Type	Description
`text`	`string`	The raw text content of the paste.

Example Response:

{ "text": "This is a new paste for the service." }

Note: If the hash is not found, the API will return a 404 Not Found status.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Pastebin Service ⚙️

This is the core service that orchestrates the entire process. Its responsibilities are:

Data Storage: It's responsible for the primary action of the service—storing text data. It writes the raw text content to AWS S3 and saves the corresponding metadata to the database.
URL Creation: It's tasked with generating the unique, shareable URL for each paste. This is done by creating a hash based on a UUID and User ID, which ensures uniqueness for each session. This hash is stored as encoded_url in the database, and the final URL is constructed as domain/hash.
URL and Data Retrieval: When a user accesses a URL, the service must:
- Extract the hash from the URL.
- Use the hash to query the database and retrieve the associated metadata (including the UUID and User ID).
- Check if expiry time has passed or not . if pass then reject request.
- Use the S3 key from the metadata to fetch the actual text data from AWS S3.
- Return the text content to the user.

How URLs are Created

The URL creation process is a key part of the service's design. The service combines two unique identifiers—a UUID (Universally Unique Identifier) and the User ID—to create a unique hash. This approach ensures that the URL is not easily guessable and is unique to a specific user and a specific "paste" session. The hash is then appended to the domain to form the complete URL, like domain/hash.

Service Components

AWS S3 ☁️

AWS S3 (Simple Storage Service) is an object store used to hold the raw text data. It's chosen for its key features:

Cost-Effectiveness: It's a cheap and scalable storage solution, making it ideal for storing large volumes of data without high costs.
High Scalability: It can handle virtually unlimited data, ensuring the service can grow as needed.

Database 💾

The database's primary role is to act as a metadata store. It's crucial for the service's functionality and stores information such as:

encoded_url: The unique hash used in the URL.
S3 Key: The pointer or key that links the database record to the actual text data stored in S3.
User Details: Information about the user who created the paste.
Other Metadata: This could include things like the creation date, expiration date, or privacy settings for the paste.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

1. Client

The Client is the user-facing component, which can be a web browser, a mobile app, or a command-line utility.

Key Responsibilities:

User Interface: Provides an interface for the user to input text and a user ID.
Request Handling: Sends POST requests to the API to store new pastes and GET requests to retrieve existing ones.
Data Presentation: Displays the generated URL for a new paste and presents the retrieved text content to the user.

2. API Gateway

The API Gateway acts as the entry point for all client requests. It provides a single, unified interface for the backend services.

Key Responsibilities:

Request Routing: Directs incoming requests to the appropriate backend service. For this design, it forwards all requests to the Pastebin Service.
Endpoint Management: Exposes the public endpoints (/pastebin/store and /{hash}).
Basic Validation: Can perform initial checks on the request path to ensure it is valid before forwarding.

3. Pastebin Service (Backend Logic)

This is the core business logic component of the system. It orchestrates the storage and retrieval of data by interacting with the data tier components.

Key Responsibilities:

Store Logic (Enhanced Detail):
1. Receives the text and userid from the API Gateway.
2. Generates a unique, collision-resistant hash based on the userid and a newly created UUID. A robust hashing algorithm (e.g., SHA-256) should be used, truncated to a URL-friendly length (e.g., 8-12 characters).
3. Asynchronously writes the raw text data to AWS S3 and receives a unique s3Key in return. This asynchronous operation is crucial for efficiency, as it prevents the service from blocking while waiting for S3.
4. Stores the metadata (including the hash, s3Key, userid, and creation timestamp) in the Database. This operation should also be handled with robust error handling, with a retry mechanism if the database is temporarily unavailable.
5. New: Calculates an expiryTimestamp (e.g., 24 hours from creation) and includes it in the metadata record before storing it in the database.
6. Constructs the full, retrievable URL and returns it to the API Gateway.
7. Error Handling: Implements a rollback mechanism. If the database write fails after a successful S3 write, the S3 object must be deleted to prevent "orphaned" data.
Fetch Logic (Enhanced Detail):
1. Receives a hash from the API Gateway.
2. Implements a Caching Layer: Before querying the database, the service can check a cache (e.g., Redis) for the s3Key using the hash as the key. This significantly improves performance for frequently accessed pastes.
3. Queries the Database using the hash to retrieve the corresponding s3Key and expiryTimestamp. The database should have an index on the encodedUrl column for fast lookups.
4. New: Checks if the current time is past the retrieved expiryTimestamp.
  - If the paste has expired, the service returns a 410 Gone HTTP status code.
  - It can also asynchronously trigger a cleanup process to delete the expired data from both the database and S3.
5. Error Handling: If the database query returns no result, the service returns a 404 Not Found response, preventing a request to S3.
6. Retrieves the text content from AWS S3 using the s3Key.
7. Returns the raw text content to the API Gateway.
8. Efficiency: The S3 retrieval is the most resource-intensive step. The service should ensure a streamlined connection and efficient data transfer from S3.

4. AWS S3 (Object Storage)

AWS S3 is the primary storage component for the raw text content.

Key Responsibilities:

Object Storage: Securely and durably stores the text data as individual, immutable objects.
Unique Keys: Provides a unique key for each stored object, which serves as the pointer for retrieval.
Scalability & Durability: Handles massive volumes of data with high availability and reliability.

5. Database (Metadata Storage)

The Database is responsible for storing and providing fast lookups for the metadata associated with each paste.

Key Responsibilities:

Metadata Storage: Stores a lightweight record for each paste, containing the encodedUrl (hash), the s3Key, the userId, and now an expiryTimestamp.
Fast Lookups: Allows the Pastebin Service to quickly find the s3Key by querying the encodedUrl.
Data Integrity: Ensures consistency between the URL hash and the S3 object key