Design Dropbox - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

User can upload contents
User can download contents
The uploaded contents are sync'd across connected devices

System Properties

Support a max file size of 50GB

Out Of Scope

Billing and Payment
We will talk about the conflict resolution but a managing teams and team edit of content is out. of scope. If we have enough time, will include those functionalities
Authentication/Authorisation

Non-Functional:

List non-functional requirements for the system...

Conflict Management - The system ensure integrity when updating content. The content updated in device-1 and updated in device-2 are handled gracefully. In case of conflict a separate copy is maintained to let user verify and merge manually.
System is lets devices to download, sync metadata across devices, and upload contents. Offline mode enabled files are downloaded to local system
Updates are eventually consistent - a new file uploaded may not be immediately visible in a different region for a couple of seconds. This makes the availability property of the system edge more than consistency
Low latency sync, content downloads. content metadata downloads < 200ms

Capacity estimation

Estimate the scale of the system you are going to design...

Total Users - 500M users, Daily Active Users (70%) - 350M users
Average file size - 100MB
Average size of uploads per day - 350M x 100MB = 350, 000, 000 x 100 = 35,00,00,00,000 i,e, 35PB per day
No of devices per user Average size of downloads per day - 4 devices
Average downloads per day - 350M people x 4 devices x 100MB average file size = approx 35PB x 4 = 175PB

API design

Define what APIs are expected from the system...

Get Metadata

Used by mobile/desktop/laptop apps and web browser to grab all content metadata from a user's account in dropbox server.

GET dropbox.api.com/1/contents
Request Header : [ 
  Authorization = Bearer <Oauth Token>
]
Response: 200 OK
Response Body: { an array of ContentMetadata }

Start Chunk Upload - Content upload

Used by mobile/desktop/laptop apps to indicate that a file upload is going to be initiated. The client will start using POST Chunk API for uploading chunks of file.

POST dropbox.api.com/1/contents
Request Header : [
Authorization = Bearer <Oauth Token>
]
Request Body: { ContentMetadata }
Response : 200
Response Header: [SessionId = ZZZwweee...]
Response body: { ContentMetadata }

Post Chunk - content upload

Used by mobile/desktop/laptop apps to upload file content in parts. Dropbox server assembles the chunk to a final content

POST dropbox.api.com/1/contents/chunks
Request Header : [
Content-Type = application/octetstream
Authorization = Bearer <Oauth Token>
SessionId = ZZZwweee...
]
Request Body: { ChunkMetadata + bytestream }
Response : 200
Response body: { ChunkMetadata }

Finish Chunk Upload - Content upload

Used by mobile/desktop/laptop apps to upload file content in parts. Dropbox server assembles the chunk to a final content

POST dropbox.api.com/1/contents/chunks
Request Header : [
Authorization = Bearer <Oauth Token>
SessionId = ZZZwweee...
]
Request Body: { ContentMetadata }
Response : 200
Response body: { ContentMetadata }

Get Chunks

Used by mobile/desktop/laptop apps to get all chunk metadata for a content from dropbox server

GET dropbox.api.com/1/contents?content={contentId}
Request Header : [
Authorization = Bearer <Oauth Token>
]
Response : 200 OK
Response Body: { A list of ChunkContent }

Upload a Content

TBD - or browsers to upload a file in its complete form

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

ContentMetadata
ChunkMetadata
User

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Chunk and Hashing file content - Block-Based Content-Addressed Storage

A file is size can be max 50GB. The file is sliced into chunks of 4MB to 16MB. The native app computes hash (XXH3 or BLAKE3) and prepares metadata for each chunk. The metadata. This metadata consists of

{ 
  chunkId: 1,
  sequence: 1,
  filename: "rocks.txt",
  offsetStart: 0,
  offsetend: 123456,
  createdOn: <timestamp>,
  lastUpdatedOn: <timestamp>,
  ownerId: "1234",
  checksum: ZZZeeewww...
  location: "path in dropbox file server"
}

Uploading Content from Mobile/Laptop/Desktop

New File upload - User creates a folder which is to be sync'd to Dropbox server. The files in this folder is sent in chunks to the server with its metadata.
File Changes - The native OS app hooks into the kernel events in order to listen to the file change events. For instance iNotify events in linux, FSEvents in iOS, windows file change events in windows OS. The native app picks the file and slices to chunk with predefined chunk size, metadata and its hashes. The hash of the chunk is compared with the hash in the local metadata cache. The chunks that differs are sent to server to limit the bandwidth usage. i.e. the delta is uploaded to server

File Sync Between Devices

When a file is changed in another device, the native app in the device prepares #1 or #2 above and pushes the full file or delta chunks of the file to the dropbox server. A Server Side Event is pushed from dropbox server to the devices to initiate a call home on changes. The native client reaches out to dropbox with a last sync timestamp. The changes post this timestamp are returned in the form of metadata. The client compares these metadata with local metadata cache and downloads the required chunks which are modified in the dropbox server and updates the local files accordingly. The updates might result in file delete, merge, add operations.

Web browser Downloading Content

Here the user login to account using a browser, the browser lists the home drive and recently accessed files. The dropbox web application lists files and folder and users can navigate/search the files

Modification of a file from Multiple Devices - Conflict Resolution

Merging of Changes

There are cases where a same file is modified in multiple devices which results in conflict and uploaded at once. For instance there are two devices modifying the file, Client A & Client B. Client A commits the file first and dropbox server updates the chunks modified by Client A and update the file version to Version + 1. That is now Client B has N - 1 version. In this case, if Client B updated different chunks in the file than the Client A, dropbox will merge the changes and produces a new version N + 1. However if both clients modifies the same content chunk then Client B will be notified that a version is updated and would require to download the latest & merge. A separate copy of Client B's file is made at dropbox server named `copy-of-xxxx` filename, by keeping Client B's changes.

Online Shared Updates - Conflict Resolution

The case here is both devices or two different users updating the same file at the same time. If both parties are updating different chunks then dropbox server will merge the file and produce a new version. However if the same chunk is modified by both parties then the last write wins. However the version is updated for each commit. This means the user can see the history of changes

Storage Layer

Files are stored in Data nodes spread across regions. This means the chunks of each file are stored in different physical server. For instance, a file is divided into 10 chunks, each chunk is stored in different storage servers in encrypted form.

Erasure Coding

When the file is divided into 10 block (chunks) a 6 parity blocks are added to it, making it 10, 6 erasure coding. This is better than replication where each block is copied to 3 or more storage servers (30 chunks altogether). This incurs huge storage for 400M users or 35PB of data.

Erasure coding helps in keeping 6 parities for 10 blocks of data making it 16 blocks of data. We can recover or reconstruct data even if lose 6 nodes (assuming each block is stored in different storage servers). The worst case is losing 7+ nodes, in which case the file is lost and cannot be recovered. Its important to store these chunks in different storage nodes. Altogether in this case atleast we need 16 server nodes.

Chunk Repairs

A storage monitoring service that looks for disk health, node health, partition healths and publishes a message. The message might indicate a disk failure or chunk failure

A chunk repair service picks up the message and acts on it by looking at degraded chunks and repair them by reconstructing and placing the chunks in same or different storage nodes. It reads the chunk metadata from Chunk coordinator service and tries to repair the chunks. This is part of health and wellness monitoring job that picks up the chunks to repair.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

General Flow

Client sends a request (upload, download, Get) to an edge load balancer which terminates SSL and forwards the request to API gateway at the nearest datacenter.
API Gateway authenticates, authorises, rate limits the requests and forwards to a DNS based Kubernetes service `<service-name>-prod.svc.cluster.local`. Kubernetes internally routes the request to a pod grouped under the service name like `upload`, `encoder`, `chunkmetadata` etc.
Kubernetes service, on receiving request will forward the request to a pod

Client Uploading a File

Client divides a 5 GB file into 320 chunks of 16MB each and sends each chunk to server, Load Balancer at the edge receives the request, terminate SSL, routes the request to an API gateway at nearest datacenter.
The API Gateway authenticates authorizes request and forwards to DNS location, which is the endpoint url of Kubernetes service for uploading chunks. The kubernetes service load balances request to a microservice running in a pod.
The microservice, chunk coordinator service, receives the request and asks the placement service to decide on where to place the chunk. The placement service returns the chunk location based on chunk metadata.
Chunk coordinator service asks the encoder service to encode chunk (reed-solomon encoder).
Erasure coding service computes parity of all chunks on finish API call from client.
the chunk coordinator asks chunk service to write the chunk to the location suggested by placement service, this includes chunks and

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Chunk/Content metadata storage - Cassandra cluster or Spanner DB for faster and consistent writes.
Blob File Storage - A cluster of block storage servers with onboarded/attached storage disks. These rack servers with disk array.
Microservices - deployed in kubernetes, grouped by their functions - Chunk metadata service that reads the metadata, Chunk service that writes metadata, Chunk encoding service that encodes a chunk, chunk encrypt service that encrypts/decrypt a chunk etc
Load Balancer - an edge L7 load balancer that forwards request to a API gateway nearest to the client
API gateway - Kong, Apigee etc that routes traffic to Kubernetes service
Kubernets - all microservices are deployed in kubernetes, each pod consists of a stateless version of microservice and can scaled up/down based on the traffic

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Security - Scan contents for malicious contents
Encrypt Data at Rest
Integrity - checksum verifications at client and server
Sharing content
Organising files into folders, classifying, grouping, tagging etc