Codemia | Master System Design Interviews Through Active Practice

My Solution for Design Dropbox with Score: 8/10

by utopia4715

System requirements

Functional:

User can upload files to this system.
User can update existing files.
User can organize files in directories.
User can share file or directory with other users.
File sharing options are read-only, or read-write.
User can edit file while being offline from server.

Of course there are many other requirements such as preview, collaboration, and so on. But I will start with these requirements.

Non-Functional:

Data durability. Once file is uploaded, it should never get lost.
Availability
Scalability. Many users.
Consistency. It would be desirable for files to be consistent across data centers.
Security. Files can be accessed by the owner and users with explicit sharing permissions. By default, its't not shared by anybody other than the owner.
Response time: reasonable. Would like to show a home directory quickly (e.g. sub second), but if upload & download takes 1 - 10s depending on file size, that'll be acceptable.

Key Points:

Data partitioning and backup.
Tradeoff between consistency and scalability.
Privacy concern w.r.t. deleting files.

Capacity estimation

200M DAU

Each person reads twice a day, writes once a day.

Average file size: 1MB

20M new files generated per day

20TB new files per day.

In two years, it would be 14.6PB of data

Rough Idea:

Raw data storage, like Amazon S3, for files themselves.
RDB for metadata

API design

upload(user_ID, file_content, file_name, parent_directory_ID) -> returns file_ID in JSON along with some metadata.
update(user_ID, file_ID, file_content) -> replaces the whole file
update_chunk(user_ID, file_ID, chunk_ID, chunk_content) -> replaces a specific chunk (e.g. 4MB chunk) of a file
download(user_ID, file_ID)
preview(user_ID, file_ID)
mkdir(user_ID, parent_directory_ID, directory_name)
share(file_ID, [user_IDs], sharing_type) -> directory_ID can be used in place of file_ID. group_IDs can be used in place of user_IDs. Sharing_type has several options: read only, read-write, can upload (but not download), can comment (but not update), etc.

All APIs return HTTP error codes.

Database design

One approach is to store files in a block store, such as Amazon Elastic Block Store (EBS), to store file contents. It would be scalable.

A storage like EBS should allow updating individual blocks (chunks).

Metadata would exist:

File name
Location (e.g. parent directory ID)
Sharing options
Groups of users
What kind of access rights (e.g. read/write, read-only, comment only ...) each group has
Owner
Created time
Last edited time
File checksum

I would store this in a relational database. Size will be in the comfort zone of RDB. Having ACID consistency for metadata would be highly desirable.

High-level design

It would be smart to use CDN. Not all, but some files will show "write-once, read many times" characteristics. For example, a famous person uploads some announcement and shares it with many users. In such a case, CDN would cache this file. The many users who read it would access the copy in the CDN, instead of hitting the origin service.

An important design aspect is splitting up a big file into multiple chunks (e.g. 4MB each chunk). This would:

Help upload and download efficiency, as we can parallelize upload & download, and retry when transmission fails.
Help deduplication in case the same chunk appears in multiple files.
Help de-fragment the storage system, as we can move chucks around for replication and partitioning.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Write path consists of:

File upload. This has provide enough upload bandwidth. We probably want to support SFTP based upload, as I believe it is more efficient than HTTP based upload.
Some metadata, e.g., user ID, creation time, etc., will be written to RDB.
File upload will create a message in Message Queue, which would trigger several async jobs. E.g., scrutinize banned contents, scanning for virus and other malicious contents, generating previewed files, etc. The key is that the async jobs can work independently from the main upload path.
Another async task is to replicate uploaded files to multiple locations.
File sharing, along with updating other metadata, is taken care of by Meta Data service.

Read path consists of:

Downloading is taken care of by File Download Service.
Preview is shown by Preview service.

Services are split in a way:

Upload service as the workhorse, high throughput, service that takes care of the critical functionality of uploading files.
Async tasks who can work independently from the upload service.
Read paths are split by DownloadService and Preview Service.

Data replication:

Each uploaded file should be replicated to: one copy in same data center for a quick recovery, one copy in the same region, and finally in a data center in another region in case of a disaster.

Data partition:

Partitioning data so that clients are close to the data (e.g. files uploaded by a customer in Japan can be stored in a data center in Japan) would improve read performance.
Geo location based partitioning also provides defense against disaster recovery, so this is an idea worth exploring. We should do performance measurement and see if this benefit outweighs the challenge (more complex architecture, more cost for data centers).

API Gateway authorizes user's access requests via OIDC. The client would present OIDC scopes it has. API GW makes sure the scope is enough to grant access to the files.

*** Algorithm:

One of the most important aspects is to make sure files are uploaded correctly, and stay accurate.

Before file is uploaded, the client should take a checksum (e.g. SHA1) of the file, and send it to the server, along with the file itself.
One of the async jobs should read the file from Blob Store, calculate the checksum, and make sure it matches the checksum sent in (1).
Periodically, another batch job should calculate this checksum and make sure the data stays intact. This is particularly important when data moves (e.g. recovered from a backup copy, or moved to a different data center).

*** Offline work

Another important consideration is supporting offline work by client.

If client goes offline, let's say for 4 hours, the user still would like to keep working on the file. (E.g. writing a document on an airplane.).

To do this, client should store all the chunks that make up the whole file. Each chunk should have a version number.

When the client detects it becomes online after some time, it connects to the server and check:

Has any of the chunks updated on the client?
Has any of the chunks updated on the server?

If either client or server has updated a file, while the other party has not, then the first party should send the updated chunks (we can tell by version numbers) to the other party.

If both sides updated the document, this is a conflict, and it has to be resolved.

As a general purpose file system, our service may not be in a great position to provide an automatic conflict resolution algorithm. Doing so would require domain specific algorithm.

Therefore, by default it should support letting the user know a conflict has happened (e.g. making file name appear on the client in bright red), and giving user an option to adopt the version on the server or the version on the client.

Trade offs/Tech choices

Another interesting trade-off is privacy vs data privacy.

Privacy laws require that users can delete their data completely.

However, to be able to recover data from a disaster, it's important that the files are copied for backup in multiple places.

We need to have a consistent and accurate index so that we can find all the copies. Such a data should be stored in a RDB for high consistency.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?