System requirements
Functional:
- Users can upload files.
- Users can download files.
- Users can share files.
- Users can can access multiple devices.
- Users can create nested folders and upload files there.
Non-Functional:
- CAP - Availability over consistency
- Scalable
- Low latency
- Size limit, 50 MB
Capacity estimation
500 M DAU
Read heavy, people access and read the uploaded files more often.
Read: Write is 4: 1
100 M users upload 2 files every day, on an average.
Average file size is 1 MB. Each user uploads 2 MB
2 MB * 100 M = 200 TB of data being uploaded per day
200,000,000,000,000 / 100,000 = 2000,000,000 = 2000 GB / second
API design
POST:
upload_file(user_token, file_path, file_data, file_name, file_metadata)
returns: file ID
GET:
download_file(user_token, file_path, file_data)
returns file_data in bytes
POST:
create_folder(user_token, folder_path)
returns 200, or error
POST:
share_file(user_token, users, file_path)
returns: 200 or error
Database design
Core Entities:
User
File
Workspace
- User
user_id - PK
username
first_name
last_name
Workspace:
work_space_id: PK
created_by: FK, user_id
path:
timestamp
File:
- file_id
- workspace_id
- uploaded_by
- upload_date
- total_size
File_block:
- file_block_id
- file_id
- upload_path
- timestamp
File_share:
file_share_id
file_id
shared_with: FK - user_ids
shared_with FK - user_ids
status: (uploaded, cancelled)
High-level design
- File Upload Server
- Notification
- Object Storage
- Block Server
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?