System requirements


Functional:

  • User can load/upload files.
  • Modify content.
  • Share files with permissions (Read-only/Write-only etc.).



Non-Functional:

  • Availability
  • Durability.
  • Scalability.
  • Consistency.
  • Security.
  • Low latency.




Capacity estimation


. Blob Storage (File Content)

  • Average file size: 10 MB
  • Uploads per user per day: 10
  • Active users: 100,000
  • Daily upload volume: 10 TB/day
  • Annual storage needed: ~3.65 PB/year

Metadata Database

a. Files table:

  • Rows/year: ~365 million
  • Row size: ~300 bytes
  • Total: ~110 GB

b.File versions table:

  • Rows/year: ~550 million (assuming 1.5 versions per file)
  • Row size: ~400 bytes
  • Total: ~220 GB

c. Other tables

  • Estimated size: ~100–150 GB






API design

File APIs:

POST /files – Create a new file upload (get upload URL)

PUT /uploads/{upload_id}/chunks/{part_number} – Upload a chunk

POST /uploads/{upload_id}/complete – Finalize chunked upload

GET /files – List user’s files

GET /files/{file_id}/download – Download a file

DELETE /files/{file_id} – Delete a file


Versioning:

GET /files/{file_id}/versions – List versions of a file

GET /files/{file_id}/versions/{version_id} – Get a specific version


Sync & Devices:

POST /devices – Register a new device

GET /sync – Get list of files needing sync

POST /sync/ack – Acknowledge synced files


Jobs:

GET /jobs/{job_id} – Get job (e.g. virus scan) status








Database design


Database tables:

Table users {

user_id <-key

email

}


Table files {

file_id <- key

user_id <- reference to user

name

type

size

version

created_at

updated_at

}


Table file_version {

version_id <- key

version

file_id <- refernce to files

blob_path

status

checksum

uploaded_at

user_id <- reference to users

}


Table chunks{

chunk_id <-- key

version_id <- reference to vestions

number

blob_path

created_at

size

}







High-level design


  • CDN
  • API Gateway - load balancer, router, auth, rate limiter and ssl termination.
  • File uploader - writes metadata of file to Metadata DB and returns URL to chunk uploader.
  • Chunk uploader - uploads data in blob store.
  • Blob store - stores user content (AWS S3).
  • Metadata - stores file metadata (Postgres).
  • Sync service - synchronizes data on the server and on the client device.
  • Async job - notify all devices.




Request flows

Entry Point

  • Client interacts with the system.
  • CDN (Content Delivery Network) is used for caching and accelerating delivery of static assets or downloads.
  • API Gateway serves as the main entry point, routing requests to appropriate services.


Upload Flow

  1. Client initiates an upload request via the API Gateway.
  2. API Gateway routes the request to the File Upload Service.
  3. File Upload Service:
    • Registers the upload and generates an uploadID.
    • Calculates chunking strategy based on file size.
    • Returns chunk upload URLs (or endpoint) to the client.
  4. Client uploads file chunks directly to the Chunk Uploader.
  5. Chunk Uploader:
    • Stores chunks in the Blob Store.
    • Records upload metadata (e.g., file name, size, parts) in the Metadata DB.
  6. Once upload is complete:
    • Metadata and event notifications are sent to the Sync Service and/or published to the Message Queue.


Download Flow

  1. Client sends a download request via the API Gateway.
  2. API Gateway routes the request to the File Download Service.
  3. File Download Service:
    • Fetches file metadata from the Metadata DB.
    • Coordinates with Chunk Downloader to retrieve individual file chunks from the Blob Store.
  4. Chunks are reassembled and streamed back to the client.


Syncing & Background Processing

  • Sync Service maintains consistency across devices, triggering updates or syncing actions as needed when files change.
  • Async Jobs perform background processing such as:
    • Virus scanning
    • Preview/thumbnail generation
    • File indexing or OCR
  • These jobs are triggered via events published to the Message Queue, ensuring non-blocking and scalable task execution.







Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?