Design Dropbox - System Design

System requirements

Functional:

Web interface
User has accounts
Can create folders
Upload any files into folders
Download file
Delete files / folders
Only the owning user can view and download the file
Handle parallel uploads with conflicting file names -- just rename (i.e. add suffix)
Descoped
- Permission for sharing
- Versioning of files (upload same name file, keep older versions)

Non-Functional:

Durability -- uploaded files should not be lost
Security -- ensure files stay private and only owner can access
Not latency sensitive
Low cost to store and upload/download large amounts of files

Capacity estimation

Each user uses 50GB

10,000,000 users

500,000,000 GB total storage --> 500 Petabytes

10% users active per day reading and uploading

1,000,000 uploads per day --> 41k per hour -> 683 per minute -> 12 per second

10,000,000 reads per day

API design

CreateFolder

parent_folder_id

ListFolder

UpdateFolder

DeleteFolder

/* or one at a time, b/c limiting factor is the upload. Client can rate limit how many parallel uploads to support */

RequestUpload

this return a signed S3 URLs to multi-part upload the files to (assuming we don't need to pre-process the upload in our server)

CreateFile

parent_folder_id
uploaded_s3_key

GetFile

return object includes a signed S3 URL that can be used to access the actual file

UpdateFile

DeleteFile

Database design

folders table

_id

parent_folder_id

user_id

name

metadata

primary key _id

items table

_id

parent_folder_id

user_id

name

other metadata

unique index on (parent_folder_id, name)

primary key _id

users table

_id

auth0_user_id

...

Database used for object metadata. Actual files stored in Cloud Object Store (something like S3).

Common queries

List folder and files in a folder
Get a file by id

High-level design

Use authN/authZ service (like auth0). On successful client token creation, create a record in our own user table if one doesnt exist.

User clicks login in frontend
Redirect to auth0 for authN
User goes through auth flow and gets back some authorization token that can be used to authorize API requests
Our API endpoints accept the authorization token, validate it, and can load user information from DB or auth0

Load balancer for horizontal scaling and zero downtime deploys

In multiple AZs
Single region

(potentially separate service) API gateway to validate authorization token.

Separate if using services under the hood (i.e. Stripe has different services per API endpoint)
Also useful for multi-region routing

Application server

business logic
Single region, but in multiple AZ

Database

Sharded (shard by parent_folder_id, could also use user_id but more at risk for hot shard)
Each shard has its own replicaset for availability
- One primary, 2 or more secondaries
- Secondaries can also offload read load -- if we are ok with eventually consistent reads
- Writes will write to primary (for consistency, esp w/ network partition, can write to majority -- sacrifice latency)
- Reads can be from single secondary (if ok with eventually consistent) or from majority to be strongly consistent
- Probably ok to be eventually consistent -- may have slight UI quirks if user refreshes page immediately after creating/deleting a file and before replication to secondary occurs

Cloud object store (i.e. S3) for cheap storage

If our servers don't need to process the files (maybe virus scanning?), then can have browser directly upload to S3 to avoid costs on our servers (network and storage).

No CDN for file uploads

But can use CDN for webapp resources

Request flows

CreateFile

Webapp makes api request to requestUpload
Load balancer sends it to available application server
Authorize API request with auth token
Application server will
- will generate a unique key for S3 -- can store all files in flat bucket
- create signed S3 url that frontend can upload to
Webapp starts multifile upload to S3 with signed url and manages upload status / retries
Once upload completes, make api request to createFile
App server will
- create new record in DB
- if there is a clash in file name, rename by appending suffix and retrying
- Return new file metadata

What happens if createFile fails?

Orphaned S3 item.
Create a background job that occassionaly scans through S3 bucket and deletes orphaned items (not in DB)

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?