System requirements
Functional:
Upload file(text file, photos, videos)
Share file(each file has its sharable link) - only consider if the user can view the file or not.
Collaborating editing
Multi-versioning
Multi-level access control(view, comment, edit)
Non-Functional:
High availability
Low latency
Data storage backup - no data loss.
Capacity estimation
Assumen 100,000 ADU - upload file every minute
100,000 / 60 = 13000 TPS ~= 10K TPS
QPS = 5*TPS = 65000 QPS ~= 100K QPS
File size: 100KB max
10K * 60 * 60 * 24 = 864M Files / day = 86.4 TB/year - huge storage - BLOB storage for static file.
API design
GET openFile {file_link, user_id} :
return the file if user has the access.
return 403 Forbidden if user doesn't have the access to this file.
return 500 if server is down, or any other internal error.
POST uploadFile {file(BLOB), user_id} :
return the file_link if file is uplodaded successfully.
return 413 if file is larger than the limit.
return 500 if server is down, or the file failed to upload or any oter internal error.
PUT updateAccess {file_link, access(Object)}:
return 200 if the access is updated successfully.
return 500 if server is down, or any other internal error.
access: {
type: ACCOUNT, TEAM, ALL
type_id: id
}
Database design
BLOB storage: {
url: String
file: BLOB
}
SQL:
file_metadata: {
file_url: String (PK)
owner_id: String (FK)
created_at: timestamp
updated_at: timestamp
name: String (Search Index)
}
user: {
user_id: Long(PK)
name: String
team_id: Long(FK)
}
team: {
team_id: Long(PK)
team_name: String
}
NoSQL:
access_graph: {
file_url: (Search Index)
user_id: String (Search Index)
}
High-level design
Please see High-Level Diagram
Request flows
openFile:
After the request arrives in load balancer, it will route to access service by following the load balancing algorithm. Access service will check if the current user_id has the access to this file_url. If have, request file service to download the file. If doesn't have access, return 403 Forbidden.
uploadFile:
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?