System requirements


Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

  • file upload
  • file download
  • sync files and keep them in sync across devices
  • share files


Non-Functional:

List non-functional requirements for the system...

  • durable
  • consistent (same version of a file should reflect the same data)
  • highly available
  • secure (only authorized users can access a given file), supports encryption



Capacity estimation

Estimate the scale of the system you are going to design...

Average size of a file uploaded to the system = 5 MB


Average Daily Active Users (DAU) = 1 million

~20% upload files per day. Amount of data uploaded per day = 200K * 5 MB = 1 TB

Amount of data uploaded per year = 365 TB


Number of concurrent users = 1K per second approx




API design

Define what APIs are expected from the system...


  • Upload a file

PUT https:///file

{

data:

name:

owner:

}

Returns 201 Ok


  • Download a file, sync a file

GET https:///file?name=f1


  • Share a file

POST https:///file?action=share

{

file_name:

share_with: {{}, {}..}

}

Returns 201 Ok



Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


File data is stored in a file store on the cloud. This can be an Object or Block Store like Amazon S3.


Metadata Tables

==============


File Metadata

----------------

File Name:

Owner:

Device id:

Created At:

Last Modified:

sha:

status:

chunks: { id: status: sha: ...}

Shared With:


Access Table

---------------

User Id: ==========> Primary Key

File Name: =========> Primary Key

Permissions:


Given the number of users and amount of data, a NoSQL Database is suitable for Metadata store. A document store like MongoDB can be used.


High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...


User: End-user who uploads/downloads/shares/syncs files


API Gateway: Performs authorization, authentication, and request routing.


Metadata Service: Responsible for managing metadata about the files (size, owner, permissions, shared with, upload status etc.)


Metadata Cache: Cache of the metadata for fast access.


Metadata DB: Persistent Store containing the metadata. This can be a SQL or NoSQL database.


File Store: User files are stored in a flat cloud-based object or block store such as Amazon S3. Metadata Service interacts with the File Store to get pre-signed URLs for upload and get upload status.


CDN: Content Distribution Network which stores frequently accessed files at the edge for fast user access.




Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


Upload

=======

  • User authenticates with the API gateway and is directed to the Metadata Service.
  • Metadata Service requests for a pre-signed URL from cloud store which the user can use to directly upload the file. Pre-signed URLs enable the Metadata Service to process other requests and not handle uploads/failures directly. It is a mechanism to give secure access to a resource for a limited time.
  • User uploads the file in chunks using multi-part upload.
  • Metadata Service updates the DB as 'uploading' for the particular chunk.
  • Once successfully uploaded, the cloud store invokes a lambda or callback on Metadata Service. The respective chunk is marked as 'uploaded'.
  • When all chunks are successfully uploaded, the file is marked as 'uploaded'.


Download

=========

  • User authenticates and requests for a given file.
  • Metadata Service gives the chunk ids and their sha to the user and redirects to the cloud store
  • User downloads the required chunks from the cloud store.


Sync

=======

Similar to the 'download' flow.


Share

======

  • User authenticates with the API gateway
  • Updates permissions and share list of the file through the Metadata Service
  • At the same time, Access Table is also updated for fast access



Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


Upload Directly to Cloud Store

=======================

Eliminate extra hop of uploading/downloading files to the Metadata Server first.


Use Pre-signed URLs

===================

Pre-signed URLs enable the Metadata Service to process other requests and not handle uploads/failures directly. It is a mechanism to give secure access to a resource for a limited time.


Use Resumable or Multipart Uploads

============================

  • User authenticates with the API gateway and is directed to the Metadata Service.
  • Metadata Service requests for a pre-signed URL from cloud store which the user can use to directly upload the file. Pre-signed URLs enable the Metadata Service to process other requests and not handle uploads/failures directly. It is a mechanism to give secure access to a resource for a limited time.
  • User uploads the file in chunks using multi-part upload.
  • Metadata Service updates the DB as 'uploading' for the particular chunk.
  • Once successfully uploaded, the cloud store invokes a lambda or callback on Metadata Service. The respective chunk is marked as 'uploaded'.
  • When all chunks are successfully uploaded, the file is marked as 'uploaded'.


Each chunk is identified by a SHA-256 hash.


This helps a client resume upload of big files when there are network failures or API timeouts. Chunk size can be configured adaptively based on user bandwidth.


It also allows de-duplication and avoids downloading chunks which are already present.


Use Encryption and Compression

==========================

To keep the data secure, it can be encrypted before sending over the network.


It can also be compressed for efficient storage. Note that compression ratio of text files is high but not as much for video files. Compressions of video also takes time. So compression must/not be performed depending on the file type.


Also, compression must be done before encryption as encryption adds randomness to the data, reducing the compression ratio.


Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


Upload Directly to Cloud Store

=======================

Eliminate extra hop of uploading/downloading files to the Metadata Server first.


Use Pre-signed URLs

===================

Pre-signed URLs enable the Metadata Service to process other requests and not handle uploads/failures directly. It is a mechanism to give secure access to a resource for a limited time.


Use Resumable or Multipart Uploads

============================

  • User authenticates with the API gateway and is directed to the Metadata Service.
  • Metadata Service requests for a pre-signed URL from cloud store which the user can use to directly upload the file. Pre-signed URLs enable the Metadata Service to process other requests and not handle uploads/failures directly. It is a mechanism to give secure access to a resource for a limited time.
  • User uploads the file in chunks using multi-part upload.
  • Metadata Service updates the DB as 'uploading' for the particular chunk.
  • Once successfully uploaded, the cloud store invokes a lambda or callback on Metadata Service. The respective chunk is marked as 'uploaded'.
  • When all chunks are successfully uploaded, the file is marked as 'uploaded'.


Each chunk is identified by a SHA-256 hash.


This helps a client resume upload of big files when there are network failures or API timeouts. Chunk size can be configured adaptively based on user bandwidth.


It also allows de-duplication and avoids downloading chunks which are already present.


Use Encryption and Compression

==========================

To keep the data secure, it can be encrypted before sending over the network.


It can also be compressed for efficient storage. Note that compression ratio of text files is high but not as much for video files. Compressions of video also takes time. So compression must/not be performed depending on the file type.


Also, compression must be done before encryption as encryption adds randomness to the data, reducing the compression ratio.



Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

  • Failed upload

If an upload fails, a timeout occurs and marks the upload as 'failed' on the Metadata Server.



Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?


  • Add Quota Management
  • View a file on the server
  • Simultaneous edit of files by users
  • Versioning