System requirements
Functional:
- upload/download files from multiple devices
- file changes synced among all the devices
- file versioning
Non-Functional:
- high availability
- high reliability, file lose isn't toleranted
- robost
Capacity estimation
Suppose we have 100M users/10M DAU. Each of them create 10 files a day. 10MB in average and it could be 1KB - 100GB
API design
GetFile(file_name)
CreateFile(file_name, content)
Database design
User(user_id, email....)
File(file_id,file_name, file_version, user_id)
Chunk(chunk_id, file_id, )
Devices(device_id, user_id, )
High-level design
We basically split into 3 big modules:
Metadata Server:
- CRUD files metadata
- Fanout any file updates to notification server
Chunk Server:
- deduplicate file chunks
- upload chunks to cloud storage
- update per-file chunk information to metadata server
notification server
- poll the file updates from the metadata server and send notification to the clients through long polling
Request flows
Uploading path:
- The clients calls the metadata server first to creat a file and marked it as `Uploading`
- The clients calls the chunk server to send data
- The chunk server forward data to cloud storage
- when fowarding is down, the chunk server call metadata server to update the chunk information and mark it as `Ready`. Also, it responds client with OK
Notification path:
- The notification server gets called and it notifies the client with the updates
Downloading path:
- the clients call the metadata server to get the metadata informtion
- download the chunks from the chunk server
- the chunk server gets the chunk information from metadata server
- the client downloads all the chunk from the chunk server and compose the file in local
Detailed component design
Chunk servers.
- parallel uploading: first of all we can split a big files into small chunks, let's say 10MB each. It can be done on the client side so that the client can fully utilize the network bandwidth. Same for the chunk server to the cloud storage hop
- data deduplication: chunk servers can calculate the hash value of each chunks and do a dedup by looking up the metadata server. This should be pretty common in file versioning where there could be very small edition on one files.
- data integrity: we can ask the client to calculate the hash value, sent to the metatadata server and the chunk server to calculate that again to ensure the data isn't corrupted. If yes, we have to ask clients to resend corrupted chunks.
Polling/Long-polling/Websockets
- The notification between the notification server and the client can be supported by multiple ways. Here polling may be not ideal as it is waste of resource. As for long-polling and websockets, it depends. Websockets is more performant but relatively complicated protocol than long-polling.
How to save storage cost?
- data deduplicate: mentioned above
- set version limit: we can set a limit for versions. Beyond certain limit, we don't support them.
- use cold storage: this is a feature of Cloud Storage, we can let Cloud Storage to transfer the data that hasn't been accessed for 2 weeks to cold storage.
Data replication/durability
Normally, we can relies on the Cloud Storage's data replication for our durability. Usually they already did that by replication and we don't need to do that again.
Failure scenarios/bottlenecks
It is possible transimission failed between client and chunk server side or the chunk server to cloud storage side. As mentioned above, we use the database status tracking if the uploading is sucess or not and use checksum to decide if the data is corrupted or not. Otherwise we have to ask clients to retry with exponential back-off
Sharding: we need to shard our metadata DB in the future. Apparently we can shard by file_id. The another option is user_id. The advantage of using file_id is to prevent hotspot issue though sharding by user_id is better during fanout.
Future improvements
We can add monitoring to monitor our service SLA. It should be very reliable.