System requirements
Functional:
[file]
- upload files
- download files
- view file list
- move file
- edit file name
- delete files
[folder]
- create folder
- move folder
- edit folder name
- delete folder
[advanced function]
- sync files
- share file link
- manage file version
Non-Functional:
Scalability
Availability
Security
- unauthorized user can't view and edit and delete from shared link
Capability
- system can process large file
Reliability
Fault Tolerance
Capacity estimation
- user : 1M
- data : 2G/month = 67M/day
- data in a day = 1M*67M = 67TB/day
API design
POST /api/v1/file/upload?
Database design
[file_metadata]
id, integer, primary key
version, Integer
name, String
size, double
createdAt, timestamp
createdBy, String
updatedAt, timestamp
High-level design
sync service
client app
- detect file changes
- upload file by chunk
file storage
- we should use distributed file system like s3 that's going to store and replicate data in a reliable way
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
- conflict resolution
- automatic resolution : two user edit different line
- manual resolution: two user edit same line
- sync file
- delta sync
- only sync chunk whose hash is changed
Trade offs/Tech choices
- mongoDB
- good : scalability
- bad : eventual consistency
- Active-Passive multi AZ
- good : Strong consistency
- good : simple fail-over
- bad : useless standby instance for ideal situation
Failure scenarios/bottlenecks
mongoDB failure
- multiAZ DB
- circuit breaker
S3 failure
- multi region repllication
API server bottleneck
- multi LB
- rate limiter setup at nginx
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?