Design Dropbox - System Design

Requirements

Functional Requirements:

Allow users to upload files to the system.
Enable users to download uploaded files.
Ensure synchronization of files between local and server storage.

Non-Functional Requirements:

strong consistency - users mustnt see different versions of the same document by accessing it from different devices, everything must by consistent and synced and we cant afford data loss and missing documents
durability - data should be persisted for longer periods of time eg at least 10 years stored and data should not be lost
low latency - users will be annoyed if loading documents takes too much time, the responses should be as quick as possible, achieving this by using sharding and data replication accross multiple cdns and regions
scalability - the system should be able to scale out by increasing the number of servers in case of high demand
no single point of failure - the system should have multiple fallbacks in case of isolated failures
retryable/resumable uploads - in case an upload fails or the user stops it, the system should be able to pick it up and retry/resume it
security - all the stored data should be encrypted
availability - system should be available 99.99% (up time) during high spikes too
offline edits should be possible and should handle conflicts gracefully, eg allowing the first update and announcing the latter one that there's a conflict that must be resolved

API Design

POST /files/upload?uploadType=simple - used for simple uploads when the file size is small

uploadType=resumable - used for large resumable files

body: {file:File, fileDetails:JSON, workspaceId:string}

POST /files/resume/uploadId/{uploadId} - used for resuming an upload

POST /download/file/{fileId} - used for downloading a file content

GET /files/{fileId} - used for fetching the data of a file including its upload status

High-Level Design

upload flow:

client sends request which reaches api gateway
client is checked to be authenticated and authorized and within the rate limits
block servers break the files in chunks, dropbox uses 4mb block size. each block gets a hash associated and is then encrypted and stored. when editing a file, we only check hash values of the blocks and update only the edited blocks, not the entire file
cloud storage contains the file split in multiple encrypted blocks
cold storage keeps the least used files or file versions in order not to aglomerate the cloud storage and to make the processing faster
request reaches load balancer which evenly distributes the work across servers.
api servers handle all the logic except for the file content, they handle user management, file metadata, details, etc
metadata cache stores the metadata for faster retrieval
metadata db stores the files details, file versions, metadata, everything except for the file content. here we will store file versions, when a file reaches too many versions, we will have to remove the old ones and add constrictions on versioning
notification service notifies the other users that a file has been uploaded
offline backup queue is used to store updates if a user is offline until it comes back online and then commits the changes

to achieve consistency, data from master db and slave replicas must be kept in sync, also each db write will invalidate the cache. a cdn will be used as cloud storage and it will store the file contents, metadata db wont have the contents, only details and metadata and file versions.

flow for resuming a download/upload - we make a call to fetch the url from server side and then we check the file status, if it is still uploading or downloading in progress, then we can resume it by checking the missing blocks and continuing to upload/ download the missing ones.

on system failures:

if a load balancer fails, another available one will take its place;

block server failure - other server will pick it up

cloud storage: s3 failure, a new s3 instance will pick it up, either from same region, or if a region is down then from a near region;

api server - they are stateless so a new api server instance can pick it up

metadata cache failure - metadata cache is replicated multiple times so a replica can pick it up

metadata db failure - if master is down then a slave will be prompoted, if a slave is down then a new instance will take its place

notification service failure - a new instance will start picking up from the ongoing requests

offline backup queue failure - if one queue fails then subscribers might need to resubscribe to a new one

Detailed Component Design

files are split into blocks of 4mb on upload and then each block gets a hash value, is then encrypted and stored in cloud storage. on editing a file, we compute the hash values and check against the stored ones, if we find different values then we update only those blocks and write in cloud the updates. cold storage will store the least used files and file versions so it wont occupy too much memory in cloud storage (which needs to be as fast as possible for retrieval/writes)

on system failures:

if a load balancer fails, another available one will take its place;

block server failure - other server will pick it up

cloud storage: s3 failure, a new s3 instance will pick it up, either from same region, or if a region is down then from a near region;

api server - they are stateless so a new api server instance can pick it up

metadata cache failure - metadata cache is replicated multiple times so a replica can pick it up

metadata db failure - if master is down then a slave will be prompoted, if a slave is down then a new instance will take its place

notification service failure - a new instance will start picking up from the ongoing requests

offline backup queue failure - if one queue fails then subscribers might need to resubscribe to a new one

conflicts handling: if 2 users edit the same file, then the first one that saves the update will work normally and will be let through but the second one will be anounced that there s a conflict that they must fix

A background job runs every hour to clean up orphaned data:

Stale upload sessions:

Sessions older than 24 hours with status "in_progress" are marked expired
Associated chunks in temporary storage are deleted
Upload record moved to "failed" state

Unattached chunks:

Chunks not linked to any file record after 48 hours are deleted
Identified by scanning temporary storage and cross-referencing with metadata DB

Soft delete cleanup:

Files in trash older than 30 days are permanently deleted
Blocks with zero references (no file versions point to them) are purged weekly

Background integrity checker runs continuously at low priority:

Detection:

Scans storage blocks on a rolling basis (full scan every 30 days)
Recomputes hash for each block, compares against stored hash
Corrupted blocks flagged for repair

Repair:

Corrupted blocks restored from replica in different AZ
If all replicas corrupted, restore from backup
Alert triggered if repair fails

Prevention:

All blocks stored with 3x replication across availability zones
S3 automatically verifies checksums on read/write
Annual full integrity audit with report