System requirements
Functional:
- Ability to store a large text efficiently and return it properly
- A random unique id generator which is effective and accurate
- Rate limiting to prevent abuse
Non-Functional:
- Consistency, we want the service to be always consistent
- Data persistency, we want our service to keep the data persistent
Capacity estimation
Estimate the scale of the system you are going to design...
- We are thinking that we will have 10000 create request and 1000000 read request per day
- if we think, per write request is 256 byte, which stands for 2.5GB per day
- As writing and reading are different. we will use caching, writing and reading server different but synced
API design
1. Create Paste
Endpoint: /api/v1/pastes
Method: POST
Description: Create a new paste.
Request Body:
{ "content": "string", // Text content of the paste "expiration": "integer (optional)" // Expiration time in minutes (null or 0 for no expiration)}
Response:
- 201 Created
- Body:
{"id":"string",// Unique ID of the paste"url":"string"// URL to access the paste}
- 400 Bad Request
- Invalid input data
- 500 Internal Server Error
- Error processing the request
2. Read Paste
Endpoint: /api/v1/pastes/{id}
Method: GET
Description: Retrieve a paste by its unique ID.
Response:
- 200 OK
- Body:
{"content":"string",// Content of the paste"created_at":"string",// Creation timestamp"expires_at":"string"// Expiration timestamp (null if no expiration)}
- 404 Not Found
- Paste not found or expired
3. Edit Paste
Endpoint: /api/v1/pastes/{id}
Method: PUT
Description: Edit an existing paste. Only allowed for the creator of the paste.
Request Headers:
- Authorization: Bearer
<token>(optional if using token-based auth) - Cookies:
<session-cookie>(optional if using session management)
Request Body:
{ "content": "string", // Updated text content of the paste "expiration": "integer (optional)" // Updated expiration time in minutes}
Response:
- 200 OK
- Body:
{"id":"string",// Unique ID of the paste"url":"string"// Updated URL to access the paste}
- 403 Forbidden
- Unauthorized to edit this paste
- 404 Not Found
- Paste not found or expired
- 500 Internal Server Error
- Error processing the request
General Considerations:
- Security:
- Use HTTPS to ensure all data is transmitted securely.
- Implement authentication to allow users to edit their pastes. Consider OAuth or token-based authentication.
- Sanitize and validate all input data to avoid injection attacks.
- Rate Limiting:
- Implement rate limiting to prevent abuse, such as brute force attempts.
This API design provides a solid foundation for the Pastebin-like service, focusing on creating, reading, and editing pastes with appropriate security and validation measures.
Database design
We will go for the non-relational database which will be served across multiple region for speed, uid code and the text will be saved there. Non relational gives us flexibility to store random text very simply
.
High-level design
Components Overview:
- API Gateway:
- Acts as the entry point for all client requests.
- Manages rate limiting to prevent abuse.
- Routes requests to the appropriate backend service.
- Backend Service:
- Stateless servers that handle incoming API requests.
- Contains the Unique ID Generator for generating paste identifiers.
- Processes requests for creating, reading, and updating pastes.
- Connects to Redis for caching frequently accessed pastes.
- Redis Cache:
- Stores recently accessed pastes and their metadata to speed up read operations.
- Helps alleviate load on the backend service by reducing calls to the database.
- Worker Processes:
- Handle background tasks such as cleaning up expired pastes.
- Can be used for maintenance tasks like rebuilding cache, analytics, etc.
- Database:
- A distributed database with multiple replicas managed through Kubernetes.
- Ensures scalability and high availability.
- Uses replication for fault tolerance.
- Load Balancer (Nginx):
- Distributes traffic evenly across backend service instances.
- Ensures no single server gets overwhelmed.
- Provides failover support.
- Kubernetes:
- Manages the deployment, scaling, and operation of application containers across clusters of hosts.
- Ensures high availability and scalability of database instances.
Request flows
client-http request -> api gateway ->load balancer nginx->backend service ->cache->db
Detailed component design
client-http request -> api gateway ->load balancer nginx->backend service -> rate limiter -> uid generator -> redis cache -> db->replications -> kubernetes cluster
Trade offs/Tech choices
will go for the rails with the mongodb to make the development faster as we don't have too much concurrency or scalability here
Failure scenarios/bottlenecks
network/server issue and db failure. db failure may send us to inconsistent data and we need to handle it separately. restricting user to a certain amount and preventing too much spam/fake data/ddos attack is a issue to consider
Future improvements
more server & cache server among region
improvement in resourses to data saving & consistency
improving cache size