System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- store text and return url to access that for set period of time.
- use login and signup and session management
- The content should be accessible for a set period.
Non-Functional:
List non-functional requirements for the system...
- Availability
- Scalability
API design
Define what APIs are expected from the system...
1. Store Data
This endpoint is used to create a new paste by sending the text content and user information.
- Method:
POST - Path:
/pastebin/store
Request Body
- Content-Type:
application/json
| Field | Type | Description |
text | string | The text content to be stored. |
userid | string | The unique identifier for the user. |
Example Request:
{ "text": "This is a new paste for the service.", "userid": "user-12345" }
Response
- Status Code:
201 Created - Content-Type:
application/json
| Field | Type | Description |
url | string | The unique URL for the newly created paste. |
Example Response:
{ "url": "[https://pastebin.example.com/abcdef123] (https://pastebin.example.com/abcdef123)" }
2. Fetch Data
This endpoint is used to retrieve the text content of a paste using its unique hash.
- Method:
GET - Path:
/{hash}
Response
- Status Code:
200 OK - Content-Type:
application/json
| Field | Type | Description |
text | string | The raw text content of the paste. |
Example Response:
{ "text": "This is a new paste for the service." }
- Note: If the hash is not found, the API will return a
404 Not Foundstatus.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Pastebin Service ⚙️
This is the core service that orchestrates the entire process. Its responsibilities are:
- Data Storage: It's responsible for the primary action of the service—storing text data. It writes the raw text content to AWS S3 and saves the corresponding metadata to the database.
- URL Creation: It's tasked with generating the unique, shareable URL for each paste. This is done by creating a hash based on a UUID and User ID, which ensures uniqueness for each session. This hash is stored as
encoded_urlin the database, and the final URL is constructed asdomain/hash. - URL and Data Retrieval: When a user accesses a URL, the service must:
- Extract the hash from the URL.
- Use the hash to query the database and retrieve the associated metadata (including the UUID and User ID).
- Check if expiry time has passed or not . if pass then reject request.
- Use the S3 key from the metadata to fetch the actual text data from AWS S3.
- Return the text content to the user.
How URLs are Created
The URL creation process is a key part of the service's design. The service combines two unique identifiers—a UUID (Universally Unique Identifier) and the User ID—to create a unique hash. This approach ensures that the URL is not easily guessable and is unique to a specific user and a specific "paste" session. The hash is then appended to the domain to form the complete URL, like domain/hash.
Service Components
AWS S3 ☁️
AWS S3 (Simple Storage Service) is an object store used to hold the raw text data. It's chosen for its key features:
- Cost-Effectiveness: It's a cheap and scalable storage solution, making it ideal for storing large volumes of data without high costs.
- High Scalability: It can handle virtually unlimited data, ensuring the service can grow as needed.
Database 💾
The database's primary role is to act as a metadata store. It's crucial for the service's functionality and stores information such as:
encoded_url: The unique hash used in the URL.- S3 Key: The pointer or key that links the database record to the actual text data stored in S3.
- User Details: Information about the user who created the paste.
- Other Metadata: This could include things like the creation date, expiration date, or privacy settings for the paste.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
1. Client
The Client is the user-facing component, which can be a web browser, a mobile app, or a command-line utility.
Key Responsibilities:
- User Interface: Provides an interface for the user to input text and a user ID.
- Request Handling: Sends
POSTrequests to the API to store new pastes andGETrequests to retrieve existing ones. - Data Presentation: Displays the generated URL for a new paste and presents the retrieved text content to the user.
2. API Gateway
The API Gateway acts as the entry point for all client requests. It provides a single, unified interface for the backend services.
Key Responsibilities:
- Request Routing: Directs incoming requests to the appropriate backend service. For this design, it forwards all requests to the Pastebin Service.
- Endpoint Management: Exposes the public endpoints (
/pastebin/storeand/{hash}). - Basic Validation: Can perform initial checks on the request path to ensure it is valid before forwarding.
3. Pastebin Service (Backend Logic)
This is the core business logic component of the system. It orchestrates the storage and retrieval of data by interacting with the data tier components.
Key Responsibilities:
- Store Logic (Enhanced Detail):
- Receives the
textanduseridfrom the API Gateway. - Generates a unique, collision-resistant hash based on the
useridand a newly created UUID. A robust hashing algorithm (e.g., SHA-256) should be used, truncated to a URL-friendly length (e.g., 8-12 characters). - Asynchronously writes the raw
textdata to AWS S3 and receives a uniques3Keyin return. This asynchronous operation is crucial for efficiency, as it prevents the service from blocking while waiting for S3. - Stores the metadata (including the
hash,s3Key,userid, and creation timestamp) in the Database. This operation should also be handled with robust error handling, with a retry mechanism if the database is temporarily unavailable. - New: Calculates an
expiryTimestamp(e.g., 24 hours from creation) and includes it in the metadata record before storing it in the database. - Constructs the full, retrievable URL and returns it to the API Gateway.
- Error Handling: Implements a rollback mechanism. If the database write fails after a successful S3 write, the S3 object must be deleted to prevent "orphaned" data.
- Receives the
- Fetch Logic (Enhanced Detail):
- Receives a
hashfrom the API Gateway. - Implements a Caching Layer: Before querying the database, the service can check a cache (e.g., Redis) for the
s3Keyusing thehashas the key. This significantly improves performance for frequently accessed pastes. - Queries the Database using the
hashto retrieve the correspondings3KeyandexpiryTimestamp. The database should have an index on theencodedUrlcolumn for fast lookups. - New: Checks if the current time is past the retrieved
expiryTimestamp.- If the paste has expired, the service returns a
410 GoneHTTP status code. - It can also asynchronously trigger a cleanup process to delete the expired data from both the database and S3.
- If the paste has expired, the service returns a
- Error Handling: If the database query returns no result, the service returns a
404 Not Foundresponse, preventing a request to S3. - Retrieves the text content from AWS S3 using the
s3Key. - Returns the raw text content to the API Gateway.
- Efficiency: The S3 retrieval is the most resource-intensive step. The service should ensure a streamlined connection and efficient data transfer from S3.
- Receives a
4. AWS S3 (Object Storage)
AWS S3 is the primary storage component for the raw text content.
Key Responsibilities:
- Object Storage: Securely and durably stores the text data as individual, immutable objects.
- Unique Keys: Provides a unique key for each stored object, which serves as the pointer for retrieval.
- Scalability & Durability: Handles massive volumes of data with high availability and reliability.
5. Database (Metadata Storage)
The Database is responsible for storing and providing fast lookups for the metadata associated with each paste.
Key Responsibilities:
- Metadata Storage: Stores a lightweight record for each paste, containing the
encodedUrl(hash), thes3Key, theuserId, and now anexpiryTimestamp. - Fast Lookups: Allows the Pastebin Service to quickly find the
s3Keyby querying theencodedUrl. - Data Integrity: Ensures consistency between the URL hash and the S3 object key