System requirements


Functional:

[file]

  • upload files
  • download files
  • view file list
  • move file
  • edit file name
  • delete files

[folder]

  • create folder
  • move folder
  • edit folder name
  • delete folder

[advanced function]

  • sync files
  • share file link
  • manage file version



Non-Functional:

Scalability

Availability

Security

  • unauthorized user can't view and edit and delete from shared link

Capability

  • system can process large file

Reliability

Fault Tolerance



Capacity estimation

  • user : 1M
  • data : 2G/month = 67M/day
  • data in a day = 1M*67M = 67TB/day





API design

POST /api/v1/file/upload?




Database design

[file_metadata]

id, integer, primary key

version, Integer

name, String

size, double

createdAt, timestamp

createdBy, String

updatedAt, timestamp



High-level design

sync service

client app

  • detect file changes
  • upload file by chunk

file storage

  • we should use distributed file system like s3 that's going to store and replicate data in a reliable way






Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

  • conflict resolution
    • automatic resolution : two user edit different line
    • manual resolution: two user edit same line
  • sync file
    • delta sync
    • only sync chunk whose hash is changed


Trade offs/Tech choices

  • mongoDB
    • good : scalability
    • bad : eventual consistency
  • Active-Passive multi AZ
    • good : Strong consistency
    • good : simple fail-over
    • bad : useless standby instance for ideal situation





Failure scenarios/bottlenecks

mongoDB failure

  • multiAZ DB
  • circuit breaker

S3 failure

  • multi region repllication

API server bottleneck

  • multi LB
  • rate limiter setup at nginx



Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?