Requirements


Functional Requirements:


  • Allow users to upload files to the system.
  • Enable users to download uploaded files.
  • Ensure synchronization of files between local and server storage.



Non-Functional Requirements:


  • strong consistency - users mustnt see different versions of the same document by accessing it from different devices, everything must by consistent and synced and we cant afford data loss and missing documents
  • durability - data should be persisted for longer periods of time eg at least 10 years stored and data should not be lost
  • low latency - users will be annoyed if loading documents takes too much time, the responses should be as quick as possible, achieving this by using sharding and data replication accross multiple cdns and regions
  • scalability - the system should be able to scale out by increasing the number of servers in case of high demand
  • no single point of failure - the system should have multiple fallbacks in case of isolated failures
  • retryable/resumable uploads - in case an upload fails or the user stops it, the system should be able to pick it up and retry/resume it
  • security - all the stored data should be encrypted
  • availability - system should be available 99.99% (up time) during high spikes too
  • offline edits should be possible and should handle conflicts gracefully, eg allowing the first update and announcing the latter one that there's a conflict that must be resolved


API Design

POST /files/upload?uploadType=simple - used for simple uploads when the file size is small

uploadType=resumable - used for large resumable files

body: {file:File, fileDetails:JSON, workspaceId:string}


POST /files/resume/uploadId/{uploadId} - used for resuming an upload


POST /download/file/{fileId} - used for downloading a file content


GET /files/{fileId} - used for fetching the data of a file including its upload status


High-Level Design

upload flow:

  1. client sends request which reaches api gateway
  2. client is checked to be authenticated and authorized and within the rate limits
  3. block servers break the files in chunks, dropbox uses 4mb block size. each block gets a hash associated and is then encrypted and stored. when editing a file, we only check hash values of the blocks and update only the edited blocks, not the entire file
  4. cloud storage contains the file split in multiple encrypted blocks
  5. cold storage keeps the least used files or file versions in order not to aglomerate the cloud storage and to make the processing faster
  6. request reaches load balancer which evenly distributes the work across servers.
  7. api servers handle all the logic except for the file content, they handle user management, file metadata, details, etc
  8. metadata cache stores the metadata for faster retrieval
  9. metadata db stores the files details, file versions, metadata, everything except for the file content. here we will store file versions, when a file reaches too many versions, we will have to remove the old ones and add constrictions on versioning
  10. notification service notifies the other users that a file has been uploaded
  11. offline backup queue is used to store updates if a user is offline until it comes back online and then commits the changes

to achieve consistency, data from master db and slave replicas must be kept in sync, also each db write will invalidate the cache. a cdn will be used as cloud storage and it will store the file contents, metadata db wont have the contents, only details and metadata and file versions.


flow for resuming a download/upload - we make a call to fetch the url from server side and then we check the file status, if it is still uploading or downloading in progress, then we can resume it by checking the missing blocks and continuing to upload/ download the missing ones.


on system failures:

if a load balancer fails, another available one will take its place;

block server failure - other server will pick it up

cloud storage: s3 failure, a new s3 instance will pick it up, either from same region, or if a region is down then from a near region;

api server - they are stateless so a new api server instance can pick it up

metadata cache failure - metadata cache is replicated multiple times so a replica can pick it up

metadata db failure - if master is down then a slave will be prompoted, if a slave is down then a new instance will take its place

notification service failure - a new instance will start picking up from the ongoing requests

offline backup queue failure - if one queue fails then subscribers might need to resubscribe to a new one


Detailed Component Design

files are split into blocks of 4mb on upload and then each block gets a hash value, is then encrypted and stored in cloud storage. on editing a file, we compute the hash values and check against the stored ones, if we find different values then we update only those blocks and write in cloud the updates. cold storage will store the least used files and file versions so it wont occupy too much memory in cloud storage (which needs to be as fast as possible for retrieval/writes)


flow for resuming a download/upload - we make a call to fetch the url from server side and then we check the file status, if it is still uploading or downloading in progress, then we can resume it by checking the missing blocks and continuing to upload/ download the missing ones.


on system failures:

if a load balancer fails, another available one will take its place;

block server failure - other server will pick it up

cloud storage: s3 failure, a new s3 instance will pick it up, either from same region, or if a region is down then from a near region;

api server - they are stateless so a new api server instance can pick it up

metadata cache failure - metadata cache is replicated multiple times so a replica can pick it up

metadata db failure - if master is down then a slave will be prompoted, if a slave is down then a new instance will take its place

notification service failure - a new instance will start picking up from the ongoing requests

offline backup queue failure - if one queue fails then subscribers might need to resubscribe to a new one


conflicts handling: if 2 users edit the same file, then the first one that saves the update will work normally and will be let through but the second one will be anounced that there s a conflict that they must fix


A background job runs every hour to clean up orphaned data:

Stale upload sessions:

  • Sessions older than 24 hours with status "in_progress" are marked expired
  • Associated chunks in temporary storage are deleted
  • Upload record moved to "failed" state

Unattached chunks:

  • Chunks not linked to any file record after 48 hours are deleted
  • Identified by scanning temporary storage and cross-referencing with metadata DB

Soft delete cleanup:

  • Files in trash older than 30 days are permanently deleted
  • Blocks with zero references (no file versions point to them) are purged weekly

Background integrity checker runs continuously at low priority:

Detection:

  • Scans storage blocks on a rolling basis (full scan every 30 days)
  • Recomputes hash for each block, compares against stored hash
  • Corrupted blocks flagged for repair

Repair:

  • Corrupted blocks restored from replica in different AZ
  • If all replicas corrupted, restore from backup
  • Alert triggered if repair fails

Prevention:

  • All blocks stored with 3x replication across availability zones
  • S3 automatically verifies checksums on read/write
  • Annual full integrity audit with report