System requirements


Functional:

  • Web interface
  • User has accounts
  • Can create folders
  • Upload any files into folders
  • Download file
  • Delete files / folders
  • Only the owning user can view and download the file
  • Handle parallel uploads with conflicting file names -- just rename (i.e. add suffix)
  • Descoped
    • Permission for sharing
    • Versioning of files (upload same name file, keep older versions)



Non-Functional:

  • Durability -- uploaded files should not be lost
  • Security -- ensure files stay private and only owner can access
  • Not latency sensitive
  • Low cost to store and upload/download large amounts of files



Capacity estimation

Each user uses 50GB

10,000,000 users

500,000,000 GB total storage --> 500 Petabytes


10% users active per day reading and uploading

1,000,000 uploads per day --> 41k per hour -> 683 per minute -> 12 per second

10,000,000 reads per day



API design


CreateFolder

  • parent_folder_id

ListFolder

UpdateFolder

DeleteFolder


/* or one at a time, b/c limiting factor is the upload. Client can rate limit how many parallel uploads to support */

RequestUpload

  • this return a signed S3 URLs to multi-part upload the files to (assuming we don't need to pre-process the upload in our server)

CreateFile

  • parent_folder_id
  • uploaded_s3_key

GetFile

  • return object includes a signed S3 URL that can be used to access the actual file

UpdateFile

DeleteFile


Database design


folders table

_id

parent_folder_id

user_id

name

metadata

primary key _id


items table

_id

parent_folder_id

user_id

name

other metadata

unique index on (parent_folder_id, name)

primary key _id


users table

_id

auth0_user_id

...



Database used for object metadata. Actual files stored in Cloud Object Store (something like S3).


Common queries

  • List folder and files in a folder
  • Get a file by id


High-level design


Use authN/authZ service (like auth0). On successful client token creation, create a record in our own user table if one doesnt exist.

  • User clicks login in frontend
  • Redirect to auth0 for authN
  • User goes through auth flow and gets back some authorization token that can be used to authorize API requests
  • Our API endpoints accept the authorization token, validate it, and can load user information from DB or auth0


Load balancer for horizontal scaling and zero downtime deploys

  • In multiple AZs
  • Single region


(potentially separate service) API gateway to validate authorization token.

  • Separate if using services under the hood (i.e. Stripe has different services per API endpoint)
  • Also useful for multi-region routing


Application server

  • business logic
  • Single region, but in multiple AZ


Database

  • Sharded (shard by parent_folder_id, could also use user_id but more at risk for hot shard)
  • Each shard has its own replicaset for availability
    • One primary, 2 or more secondaries
    • Secondaries can also offload read load -- if we are ok with eventually consistent reads
    • Writes will write to primary (for consistency, esp w/ network partition, can write to majority -- sacrifice latency)
    • Reads can be from single secondary (if ok with eventually consistent) or from majority to be strongly consistent
    • Probably ok to be eventually consistent -- may have slight UI quirks if user refreshes page immediately after creating/deleting a file and before replication to secondary occurs


Cloud object store (i.e. S3) for cheap storage

  • If our servers don't need to process the files (maybe virus scanning?), then can have browser directly upload to S3 to avoid costs on our servers (network and storage).


No CDN for file uploads

But can use CDN for webapp resources


Request flows


CreateFile

  • Webapp makes api request to requestUpload
  • Load balancer sends it to available application server
  • Authorize API request with auth token
  • Application server will
    • will generate a unique key for S3 -- can store all files in flat bucket
    • create signed S3 url that frontend can upload to
  • Webapp starts multifile upload to S3 with signed url and manages upload status / retries
  • Once upload completes, make api request to createFile
  • App server will
    • create new record in DB
    • if there is a clash in file name, rename by appending suffix and retrying
    • Return new file metadata


What happens if createFile fails?

  • Orphaned S3 item.
  • Create a background job that occassionaly scans through S3 bucket and deletes orphaned items (not in DB)


Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?