Design Dropbox - System Design

System requirements

Functional:

upload/download files from multiple devices
file changes synced among all the devices
file versioning

Non-Functional:

Data durability. Once file is uploaded, it should never get lost.
Availability
Scalability. Many users.
Strong Consistency. All clients should see the same files.
Security. Files can be accessed by the owner and users with explicit sharing permissions. By default, its't not shared by anybody other than the owner.
Response time: reasonable. Would like to show a home directory quickly (e.g. sub second), but if upload & download takes 1 - 10s depending on file size, that would be acceptable.

Capacity estimation

Suppose we have 100M users/10M DAU.

Each of them create 10 files a day. 10MB in average and it could be 1KB - 100GB
100 reads a day

So around 1.5kwrite QPS and 15k read QPS.

API design

Metadata Server

upload(user_ID, file_content, file_name, parent_directory_ID) -> returns file_ID in JSON along with some metadata. This initiates an upload process. It returns the URL to which update_chunk() should send chunks to.
update(user_ID, file_ID, file_content) -> replaces the whole file
mkdir(user_ID, parent_directory_ID, directory_name)
download(user_ID, file_ID)

Chunk Server

upload_chunk(user_ID, file_ID, chunk_ID, chunk_content) -> uploads a specific chunk of a file
update_chunk(user_ID, file_ID, chunk_ID, chunk_content) -> replaces a specific chunk of a file
download_chunk(user_ID, file_ID, chunk_ID, chunk_content) -> uploads a specific chunk of a file

Database design

User(user_id, email....)

File(file_id,file_name, file_version, user_id)

Chunk(chunk_id, file_id, )

Devices(device_id, user_id, )

High-level design

We basically split into 3 big modules:

Metadata Server:

CRUD files metadata
Fanout any file updates to notification server

Chunk Server:

deduplicate file chunks
upload chunks to cloud storage
update per-file chunk information to metadata server

notification server

poll the file updates from the metadata server and send notification to the clients through long polling

Request flows

Uploading path:

The clients calls the metadata server first to creat a file and marked it as `Uploading`
The clients calls the chunk server to send data
The chunk server forward data to cloud storage
when fowarding is down, the chunk server call metadata server to update the chunk information and mark it as `Ready`. Also, it responds client with OK

Notification path:

The notification server gets called and it notifies the client with the updates

Downloading path:

the clients call the metadata server to get the metadata informtion
download the chunks from the chunk server
the chunk server gets the chunk information from metadata server
the client downloads all the chunk from the chunk server and compose the file in local

Detailed component design

Chunk servers.

parallel uploading: first of all we can split a big files into small chunks, let's say 10MB each. It can be done on the client side so that the client can fully utilize the network bandwidth. Same for the chunk server to the cloud storage hop
data deduplication: chunk servers can calculate the hash value of each chunks and do a dedup by looking up the metadata server. This should be pretty common in file versioning where there could be very small edition on one files.
data compression: it can greatly save the storage size by compressing the file blocks
data integrity: we can ask the client to calculate the hash value, sent to the metatadata server and the chunk server to calculate that again to ensure the data isn't corrupted. If yes, we have to ask clients to resend corrupted chunks.

Polling/Long-polling/Websockets

The notification between the notification server and the client can be supported by multiple ways. Here polling may be not ideal as it is waste of resource. As for long-polling and websockets, it depends. Websockets is more performant but relatively complicated protocol than long-polling.

How to save storage cost?

data deduplicate: mentioned above
set version limit: we can set a limit for versions. Beyond certain limit, we don't support them.
use cold storage: this is a feature of Cloud Storage, we can let Cloud Storage to transfer the data that hasn't been accessed for 2 weeks to cold storage.

Data replication/durability

Normally, we can relies on the Cloud Storage's data replication for our durability. Usually they already did that by replication and we don't need to do that again.

Since metadata server does lots of read, we can have a in-memory cache to help with that

The database choice on metadata DB: we can choose relational DB as the write QPS ins't high and we need strong consistency

Failure scenarios/bottlenecks

It is possible transimission failed between client and chunk server side or the chunk server to cloud storage side. As mentioned above, we use the database status tracking if the uploading is sucess or not and use checksum to decide if the data is corrupted or not. Otherwise we have to ask clients to retry with exponential back-off

Sharding: we need to shard our metadata DB in the future. Apparently we can shard by file_id. The another option is user_id. The advantage of using file_id is to prevent hotspot issue though sharding by user_id is better during fanout.

Scalab

Future improvements

We can add monitoring to monitor our service SLA. It should be very reliable.