System requirements
Functional:
List functional requirements for the system (Ask interviewer if stuck)...
- Input text indefinitely or set an expiration time
- A unique URL is generated to share with others
- Text can be either private or public
- If text is private, users can access with a private key
- Users can delete content at any time
Non-Functional:
List non-functional requirements for the system...
- Text should be encrypted
- Users should be authenticated before given authorisation
- Fault tolerant
- Highly available
- Text should have timestamp to determine when they were pasted
Capacity estimation
Estimate the scale of the system you are going to design...
- 2000 links generated per day ~= 1 million links generated per year
- 840 public links with 1 day expiry
- 360 public links with 7 day expiry
- 800 private links
- Assume 10:1 read to write ratio, so 10 million links clicked per year
API design
Define what APIs are expected from the system...
- POST /api/v1/createpastebin
- returns:
- pastelink URL
- key (optional)
- DELETE /api/v1/deletepastebin
- GET /api/v1/pastebin
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
PasteLink schema:
- id: id
- pasteUrl: string
- expireTime: date
- private: bool
- key: string (optional)
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...
flowchart TD
B[Client] --> L[Load Balancer]
L --> C[Web Servers]
C --> D[NoSQL Database]
C --> Cache[Redis Cache]
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
When creating a pastebin, we first check to see if our pastebin is private. If it is, we call our token generator which will return a UUID. Generating another duplicate token is very unlikely, so we will not check for collisions. We encrypt the pastebin text using symmetric encryption with a key that is stored in our web servers.
To retrieve a pastebin, the web server first checks the cache to see if it exists. If it doesn't, we perform a look up on the database. If the pastebin is private, we check to see the if the passed token matches the one in the given database.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Since our system is more read heavy, we look at making our system more read efficient. To start, our Redis Cache will retrieve pastebins more efficiently than database lookups. To improve the read speeds even further, we can also index the database for faster lookups. We can also partition our data to by their pasteLinkUrl.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?