System requirements
Functional:
Upload various file sizes and types to my cloud drive
Ability to control data permissions.
retrieve uploaded files within reasonable time from any plateform.
Non-Functional:
Availability
Durability (Fault-tolerance)
Consistency
Capacity estimation
200 M Users. 50% average daily active users, 10:1 = read : write. Average file size 10 MB. Read Request per User = 10, Write Request per User = 1. CPU processing per request 0.1 ms.
DAU: 100 M
R RPS: 10^8 * 10 (r/u) / 10^5 = 10^4 read rps = 10 K read rps
W RPS: 1K write rps
number of instances:
1 core handles / .1 ms = 1000/.1 = 1^4 rps = 10 K rps
10 k rps * 60% = 6 K rps
(R rps + W rps) * 0.1 ms = 1.1 K rps
1.1 / 10 * 1.5 = 200 instances
storage:
10^3 write rps * 10^5 (seconds in day) * 10^4 (size 10 M) = 10^12 B/day = 1 TB per day
1 year = 400 TB per year
5 years = 2 PB per 5 years
API design
POST /files/user_id/directory_path
GET /files/user_id/file_path
GET files/user_id
GET blob_url
Database design
Users: name, id, age, file_ids...
File_Metadata: file_id, filename, size, type, path, blobURL.. etc.
BlobStorage: data
High-level design
C: client -> CDN
CDN -> GW: API-GW
GW (route the request, check authn/authz)
GW -> LB
LB -> AS: Application Service
AS -> US: User Service
US -> UDB: Users DB
US -> AS
AS -> FS: File Service
FS -> FDB: metadata DB
FS -> AS
AS -> GW
GW -> C
C -> CDN (retrieve object storage file using url)
CDN -> Object storage
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?