System requirements
Functional:
- File Upload: Users should be able to upload files to the system, which includes handling different file types and sizes.
- File Organization: Users need the ability to organize files into directories or folders to manage their content effectively.
- File Sharing: Users should have options to share files or directories with other users, specifying permissions for access (e.g., read-only or read-write).
- Version Control: The system should support versioning of files, allowing users to revert to previous versions if needed, and to see edit history.
Non-Functional:
- We need the contents of files and directories to be durable - we cannot lose data
- Files should be secure - people should not be-able to access these files without the right permissions.
- System should be highly available for reads and writes. Eventual consistency is fine; we don't expect much concurrent access or expect real-time concurrent modification of files.
- We should have a per-user target of read and write throughput. Assume reads of 100Mbps, and writes of 10Mbps.
Capacity estimation
Assume 100m DAU
2 files read, and 1 written every day. 1MB each file.
Storage added is 100GB per day.
About 30TB per year.
Avg Read bandwidth 200GB/100000 = 20Mbps
Avg Write bandwidth 100GB/100000 = 10Mbps
Assuming average 5000 users per second.
Each one get 100Mbps downloads, 10Mbps upload.
500GBps uplink, 50GBps downlink
Average network bandwidth 500Gbps.
Assuming peak is 10x avg, we should provision 5000Gbps.
API design
POST /create-folder
{
path
}
DEL /content/:path
GET /content/:path
POST /start-upload -> version-number
{
base-version # null, if its a new file
name
path
total-number-of-chunks
}
POST /chunk-upload -> pass/fail
{
name
path
version-number
u64-encoded-content
chunk-number # which of the chunks is in this request
}
DEL /content/:path/:name
GET /content/:path
GET /content/:path/:name?version=vvv&page=xxx
GET /metadata/:path/:name -> {
version-history[] # pairs of version-number and timestamp
users-with-read-permission
users-with-write-permission
}
POST /share/:path/:name {
reads: user-email[]
writes: user-email[]
}
Database design
Path:
- path-id (primary-key)
- parent-path-id
- fully-qualified-path
File:
- file-id (primary key)
- name
- path-id
- fully-qualified-path
- user-emails-read-permission[]
- user-emails-write-permission[]
- file-version-id[]
File-Version
- file-version-id (primary key)
- file-id
- name
- path-id
- chunk-id[]
Chunk
- chunk-id (primary key)
- Chunk-Server-Leader-Name
- Chunk-Server-Follower-Name
Chunk-Server
- Chunk-Server-Name (primary-id)
- chunk-id[]
High-level design
We have the front end that gets request from the client to read and write files. To create a file, it sends a request to the chunk allocation service.
This service finds the chunk servers on which chunks for this file can be created and sends requests for doing this to chunk server managers.
Chunk server managers manage multiple chunk service. They send requests to create and delete local files to these chunk servers. For each chunk, they actually allocate two chunk instances on different chunk servers - one master and one replica. They also get heartbeats from these chunk servers. If a chunk server is unhealthy, we have to allocate chunks instances on a different chunk servers.
The front end, the chunk allocation service, the chunk server manager and the chunk server all communicate via Kafka.
The chunk allocation service is doing stateful allocation across chunk server managers and managing directory structure creation. It must checkpoint the state of these allocations back to Kafka in case the chunk allocation service fails. It needs read/write access to the cache.
The chunk server manager is partitioned according to chunk server addresses, so that one chunk server is only served by one chunk server manager at a time. It must move chunks to new chunk servers when a chunk server fails. It needs read/write access to the cache
Since the front end is doing reads and rights directly against chunk service, it must consult the cache to figure out where the chunks of a file live.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?