System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Users should be able to submit text they want to store and share
- The system should handle plain text content
- Users can specify expiration time for the past after which it is deleted
- Users can use the service without creating an account
- Each paste should have a unique URL or ID so it can be accessed later
Non-Functional:
List non-functional requirements for the system...
- Reads should be highly available, writes should be strongly consistent
- Durable - once a paste is stored it should not be lost before its expiration
- Low latency - both creation and retrieval should be fast and responsive
- Scalability - should support millions of users and high volume pastes
Capacity estimation
Estimate the scale of the system you are going to design...
Writes
1 million pastes per day, 12 requests per second
Peak times 10x the average so 120 per second
Reads
System is read-heavy, each paste likely to be read many times more
read:write ratio of 5:1 60-100 reads per second on avg to 12 writes per second
In more extreme scenarios with a read/write ratio of 100:1, read traffic could reach ~100 million per day (~1200 reads per second),
API design
Define what APIs are expected from the system...
Key entities
- paste
- user
Users should be able to submit text they want to store and share
The system should handle plain text content
Users can specify expiration time for the past after which it is deleted
POST /pastes -> Paste
body: {
text,
expiry,
}
Paste { id, expiry_time, text, url, email=null, }
Users can use the service without creating an account
Each paste should have a unique URL or ID so it can be accessed later
GET /pastes/{id} -> Paste
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
POST /pastes -> Paste
body: {
text,
expiry,
}
Paste { id, expiry_time, text, url, email=null, }
Request routed to our backend API server through API gateway which would provide some security for us. The API server would create a new paste in the database because it doesn't exist.
GET /pastes/{id} -> Paste
Routed through to same API server which retrieves data about the paste and returns it back to the client
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Now we will adapt the design to suit non functional requirements
- Reads should be highly available, writes should be strongly consistent
- we could further granularise our services by decomposing them but lets leave that for now
- To ensure HA we will horizontally scale the api service and provide a load balancer in front of it to handle increased load, we could run our service in Kubernetes for auto scaling as requests grow using a pod autoscaler
- For sake of consistency we will ensure to use postgres transactions
- Durable - once a paste is stored it should not be lost before its expiration3
- We are storing data in postgressql with expiry date so we should be fine, we wouldn't lose anything
- Low latency - both creation and retrieval should be fast and responsive
- Leveraging cache layer for increasing speed of read and leave writes to database because of immutabiulity of pastes
- Scalability - should support millions of users and high volume pastes
- For our database choice PostgresSQL can comfortably handle even the most extreme volume for read/write load, but to ease some burden and improve performance we can add a caching layer between the API server and database
- Caching could utilise a LRU cache eviction policy, where u
- We could utilise write through and read through cache like redis to ensure fast inserts
- For our database choice PostgresSQL can comfortably handle even the most extreme volume for read/write load, but to ease some burden and improve performance we can add a caching layer between the API server and database
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
- Single point of failure for api service, if we break it out into a read and write service we could scale both independently and if write service goes down .. continue serving reads
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Add a CDN layer to further improve performance and reduce load on backend