System requirements


Functional:

  • users can upload the files
  • users can get the files synced from cloud to local
  • users can download the file



Non-Functional:

  • highly available
  • consistent application wide
  • minimal latency
  • high throughput




Capacity estimation


  • users: DAU: 1M
  • reads: 80%
  • writes: 20%
  • average file size: 1GB

Calculations:


  • total uploads: 200k * 1GB = 200000 = 200 TB
  • Average API hits : 3 => 3M API calls/day => 30 RPS





API design


POST /uploads


Header: multipart/formdata as content-type

Authorization Bearer JWT


{

binary

}


GET /upload/:id/status

Authorization Bearer JWT


DELETE /upload/:id






Database design


users:

  • id (PK), indexed
  • first_name
  • last_name
  • email
  • userid
  • password (hashed)


uploads

  • URL
  • id
  • userId (FK)
  • type
  • metadata





High-level design






Request flows


user => uploads files => part by part uploads happen





Detailed component design


the scaling for API services is based on docker (Kubernetes)




Trade offs/Tech choices


for databases:


S3: blob storage (versioning enabled)

Primary DB: Postgres (master-slave replication for consistency)

Metadata: Cassandra which also stores version





Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?