Design Dropbox - System Design

System requirements

Functional:

A user can upload a file
A user can download a file
All files automatically synch

Out of scope

alerts and monitoring
security
Client application

Non-Functional:

Low-latency uploads and downloads
availability of file reads
support large files(10 GB)
eventual consistency

Capacity estimation

User count: 100 Million

DAU: 20 Million

QPS:

API design

POST: api/upload

uploads data to signed url

POST: api/metaData

return meta data

GET: api/download/

downloads file

GET: api//metadata

returns metadata

Database design

User

id
name

File

Metadata
ID
File_url
updated_time
created_time
chunk_id
upload_status

High-level design

Request flows

The way the application would work is that a file is kept on the client, the client. The client is responsible for regular checks against the server to check if metadata changed, like the updated time and version. if the updated time is different from that on the client. It either gets the latest from the server or uploads something to the update.

When uploading, we shall use something like AWS S3 which supports multipart uploading, this will give the client ability to upload the big files to s3. Another plus here is if the network is lost, the upload process doesn't have to begin all over again, it will pick up from where it left off as it knows which chunks were pushed up.

after this upload is done, the upload service shall change the metadata and add the new URL in the metadata. so when the client polls again to check for changes, the updated time would have changed and it will download this latest files on a different client

Detailed component design

A part to focus on is the s3 upload, to make upload faster and get load off the server's bandwidth, we can have the client directly send this multi-part/chunked upload to s3. To do this, we submit the meta data first, then we ask for a resigned URL. we return this pre-signed URL to the client and have them push the file up that way. This takes load off the servers, allows the client to chunk and get progress response from the upload of large files. AFter upload is complete, it can update the FIle upload status to complete

Though we are dealing with files, I will opt of using CDNs as a cache for the following reasons, the files will be pushed to the nearest data center to the user and thus will be pretty close to the user. In the event there was an offline edit and we are back online, the extra complexity of invalidating this CDN is an unnecessary overhead at the moment

A simple master replica can be used for the ACID-compliant rdbms, we want the metadata very consistent. In the event this becomes a bottleneck, we can add more replicas or even shards. Due to the limitations of how many users an account can have, the probability of a hot shard will be low but we can monitor and come up with strategies to mitigate those

Trade offs/Tech choices

One of the trade-offs with the RDBMS is in the event of a file with a lot of edits by a company that has a relatively large account, the constant read and rites to the Database can be a problem.
RIght now the client is polling, we can have it polling only when the customer opens the client app and this will be part of the initial set up.

Failure scenarios/bottlenecks

The Database instances can fail, in the event this happens though, one of the replicas will be promoted to master to accept writes. EVen in the event we are using master replica.
We can horizontally scale ou the servers for the services to handle load
We can scale out the load balances as well and with active-active, one is always able to take over in the event of a fail

Future improvements

The polling to check for changes in the time stamp might be chatty. Maybe something like Pub/Sub service which queues the events of change and pushes them to a temp storage like Redis to let the users know which files changed. the user can use something like a WebSocket or SSE to get this notifications and then and only then, get the new file