System requirements


Functional:

  1. Users can upload files.
  2. Users can download files.
  3. Users can share files.
  4. Users can can access multiple devices.
  5. Users can create nested folders and upload files there.



Non-Functional:

  1. CAP - Availability over consistency
  2. Scalable
  3. Low latency
  4. Size limit, 50 MB



Capacity estimation

500 M DAU


Read heavy, people access and read the uploaded files more often.

Read: Write is 4: 1


100 M users upload 2 files every day, on an average.


Average file size is 1 MB. Each user uploads 2 MB


2 MB * 100 M = 200 TB of data being uploaded per day


200,000,000,000,000 / 100,000 = 2000,000,000 = 2000 GB / second




API design

POST:

upload_file(user_token, file_path, file_data, file_name, file_metadata)

returns: file ID


GET:

download_file(user_token, file_path, file_data)

returns file_data in bytes


POST:

create_folder(user_token, folder_path)

returns 200, or error


POST:

share_file(user_token, users, file_path)

returns: 200 or error


Database design

Core Entities:

User

File

Workspace


- User

user_id - PK

username

first_name

last_name

email


Workspace:

work_space_id: PK

created_by: FK, user_id

path:

timestamp


File:

  • file_id
  • workspace_id
  • uploaded_by
  • upload_date
  • total_size



File_block:

  • file_block_id
  • file_id
  • upload_path
  • timestamp


File_share:

file_share_id

file_id

shared_with: FK - user_ids

shared_with FK - user_ids

status: (uploaded, cancelled)




High-level design


  1. File Upload Server
  2. Notification
  3. Object Storage
  4. Block Server




Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?