System requirements


Functional:

Upload various file sizes and types to my cloud drive

Ability to control data permissions.

retrieve uploaded files within reasonable time from any plateform.


Non-Functional:

Availability

Durability (Fault-tolerance)

Consistency



Capacity estimation

200 M Users. 50% average daily active users, 10:1 = read : write. Average file size 10 MB. Read Request per User = 10, Write Request per User = 1. CPU processing per request 0.1 ms.

DAU: 100 M

R RPS: 10^8 * 10 (r/u) / 10^5 = 10^4 read rps = 10 K read rps

W RPS: 1K write rps

number of instances:

1 core handles / .1 ms = 1000/.1 = 1^4 rps = 10 K rps

10 k rps * 60% = 6 K rps

(R rps + W rps) * 0.1 ms = 1.1 K rps

1.1 / 10 * 1.5 = 200 instances


storage:

10^3 write rps * 10^5 (seconds in day) * 10^4 (size 10 M) = 10^12 B/day = 1 TB per day

1 year = 400 TB per year

5 years = 2 PB per 5 years





API design

POST /files/user_id/directory_path

GET /files/user_id/file_path

GET files/user_id

GET blob_url




Database design

Users: name, id, age, file_ids...

File_Metadata: file_id, filename, size, type, path, blobURL.. etc.

BlobStorage: data




High-level design

C: client -> CDN

CDN -> GW: API-GW

GW (route the request, check authn/authz)

GW -> LB

LB -> AS: Application Service

AS -> US: User Service

US -> UDB: Users DB

US -> AS

AS -> FS: File Service

FS -> FDB: metadata DB

FS -> AS

AS -> GW

GW -> C

C -> CDN (retrieve object storage file using url)

CDN -> Object storage






Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?