Requirements
Functional Requirements:
- Allow users to upload files to the system.
- Enable users to download uploaded files.
- Ensure synchronization of files between local and server storage.
Non-Functional Requirements:
- strong consistency - users mustnt see different versions of the same document by accessing it from different devices, everything must by consistent and synced and we cant afford data loss and missing documents
- durability - data should be persisted for longer periods of time eg at least 10 years stored and data should not be lost
- low latency - users will be annoyed if loading documents takes too much time, the responses should be as quick as possible, achieving this by using sharding and data replication accross multiple cdns and regions
- scalability - the system should be able to scale out by increasing the number of servers in case of high demand
- no single point of failure - the system should have multiple fallbacks in case of isolated failures
- retryable/resumable uploads - in case an upload fails or the user stops it, the system should be able to pick it up and retry/resume it
- security - all the stored data should be encrypted
- availability - system should be available 99.99% (up time) during high spikes too
API Design
POST /files/upload?uploadType=simple - used for simple uploads when the file size is small
uploadType=resumable - used for large resumable files
body: {file:File, fileDetails:JSON, workspaceId:string}
POST /files/resume/uploadId/{uploadId} - used for resuming an upload
POST /download/file/{fileId} - used for downloading a file content
GET /files/{fileId} - used for fetching the data of a file including its upload status
High-Level Design
upload flow:
- client sends request which reaches api gateway
- client is checked to be authenticated and authorized and within the rate limits
- block servers break the files in chunks, dropbox uses 4mb block size. each block gets a hash associated and is then encrypted and stored. when editing a file, we only check hash values of the blocks and update only the edited blocks, not the entire file
- cloud storage contains the file split in multiple encrypted blocks
- cold storage keeps the least used files or file versions in order not to aglomerate the cloud storage and to make the processing faster
- request reaches load balancer which evenly distributes the work across servers.
- api servers handle all the logic except for the file content, they handle user management, file metadata, details, etc
- metadata cache stores the metadata for faster retrieval
- metadata db stores the files details, file versions, metadata, everything except for the file content. here we will store file versions, when a file reaches too many versions, we will have to remove the old ones and add constrictions on versioning
- notification service notifies the other users that a file has been uploaded
- offline backup queue is used to store updates if a user is offline until it comes back online and then commits the changes
to achieve consistency, data from master db and slave replicas must be kept in sync, also each db write will invalidate the cache. a cdn will be used as cloud storage and it will store the file contents, metadata db wont have the contents, only details and metadata and file versions.
flow for resuming a download/upload - we make a call to fetch the url from server side and then we check the file status, if it is still uploading or downloading in progress, then we can resume it by checking the missing blocks and continuing to upload/ download the missing ones.
on system failures:
if a load balancer fails, another available one will take its place;
block server failure - other server will pick it up
cloud storage: s3 failure, a new s3 instance will pick it up, either from same region, or if a region is down then from a near region;
api server - they are stateless so a new api server instance can pick it up
metadata cache failure - metadata cache is replicated multiple times so a replica can pick it up
metadata db failure - if master is down then a slave will be prompoted, if a slave is down then a new instance will take its place
notification service failure - a new instance will start picking up from the ongoing requests
offline backup queue failure - if one queue fails then subscribers might need to resubscribe to a new one
Detailed Component Design
files are split into blocks of 4mb on upload and then each block gets a hash value, is then encrypted and stored in cloud storage. on editing a file, we compute the hash values and check against the stored ones, if we find different values then we update only those blocks and write in cloud the updates. cold storage will store the least used files and file versions so it wont occupy too much memory in cloud storage (which needs to be as fast as possible for retrieval/writes)
flow for resuming a download/upload - we make a call to fetch the url from server side and then we check the file status, if it is still uploading or downloading in progress, then we can resume it by checking the missing blocks and continuing to upload/ download the missing ones.
on system failures:
if a load balancer fails, another available one will take its place;
block server failure - other server will pick it up
cloud storage: s3 failure, a new s3 instance will pick it up, either from same region, or if a region is down then from a near region;
api server - they are stateless so a new api server instance can pick it up
metadata cache failure - metadata cache is replicated multiple times so a replica can pick it up
metadata db failure - if master is down then a slave will be prompoted, if a slave is down then a new instance will take its place
notification service failure - a new instance will start picking up from the ongoing requests
offline backup queue failure - if one queue fails then subscribers might need to resubscribe to a new one