Requirements
Functional Requirements:
- Allow users to upload files to the system.
- Enable users to download uploaded files.
- Enable user to do resumable uploads.
- User should be able to view their files.
- Ensure synchronization of files between local and server storage.
- User should be able to select their tiers and based on that we get storage.
Non-Functional Requirements:
- System should be highly durable.
- System should be highly available and eventual consistent.
- System should be non fault tolerant
- System should retain files with TTL.
- At max user can have 5 clients
- Compressing
- Deduplication of files.
API Design
Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...
POST /files/upload -> this will give me a presigned url from the server
POST /files/{chunkId} -> this will give me a presigned url from the server
GET /files/download/{fileid} -> presigned Url for download.
GET /files/download/{chunkId} -> presigned Url for download
GET /files/ -> view files
High-Level Design
Uploading :
- User will request the server for uploading the file.
- Server will create an entry in filemetadata table which is essentially a pointer to our s3url where the files will be located, since we didnt put the constraint in file size, it can be in TB, the natural way to store them is object storage.
- Once a user got a presigned url it will directly upload in s3 bucket
- Since we have to support resumable upload, chunking will happen at client level and chunk id will have presigned url where the data will be uploaded.
- S3 will use s3 trigger to update dynamodb the status.
Downloading
- Server , DocumentMetadata Service will provide a presigned url to the user, along with manifest file (containing chunk details) and user will use that to download the chunks in a resumable manner directly from s3.
Detailed Component Design
Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.