Requirements


Functional Requirements:


  • Allow users to upload files to the system.
  • Enable users to download uploaded files.
  • Ensure synchronization of files between local and server storage.
  • Batch upload & download



Non-Functional Requirements:


  • app availability 99.99%, 52min downtime in a year
  • file/data eventual consistency
  • scalability, when a server fails, the failover
  • security, only user with permission can upload/view. encryption for file object
  • durability, needs replication for each component.


world 8B, 10% users from pop = 0.8B

system DAU will be 10% of users = 80M

R:W = 10:1

W DAU


API Design

1x1

POST /v1/file {file info}-> return {status, error code, location url}


batch

POST /v1/files {files info}-> return {status, error code, location url}


download a file

GET /v1/file/id

GET /v1/files/id={id1, id2,..}


get file info, to finalize upload

Get /v1/fileinfo/id


all above API using JWT as user security token to access endpoints



High-Level Design

fileInfo: fileId, name, author, chunkIds:[], updated, created


chunkInfo: fileId, chunkId, chunkUrl


Client → API Gateway

  • Client calls:
upload(fileId)
  • API Gateway responsibilities:
    • Route request to available upload servers
    • Use least-connections load balancing
    • Handle failover when a node goes down

👉 Design intent

  • Prevent hotspot
  • Ensure high availability at entry point

2. Raw Object Storage (S3)

  • Upload server:
    • Directly streams raw file → S3 (raw bucket)

👉 Why first write to S3?

  • Avoid coupling with metadata or processing
  • Durable storage immediately (no data loss)

3. Metadata Registration (FileInfo Service)

  • Client sends:
POST upload(fileInfo)
  • FileInfo Service:
    • Stores metadata into DB
fileInfo { fileId, name, author, created, updated }

👉 Design intent

  • Metadata and file storage are decoupled
  • Enables independent scaling

4. Async Processing via Kafka

  • System publishes event:
topic: file-uploaded

👉 This triggers downstream processing pipeline

5. Object Processing Pipeline (Async Consumers)

A dedicated Object Handler Service consumes Kafka events:

Pipeline stages:

Raw S3 Object ↓ Split into chunks ↓ Fraud Detection ↓ Compression Service

👉 Why pipeline?

  • Each stage is independently scalable
  • Failures are isolated (retry per stage)

6. Chunk Storage

  • After processing:
    • Each chunk uploaded to Chunk S3

👉 Guarantees:

  • All chunks successfully uploaded
  • Idempotent writes (important for retries)

7. Chunk Notification (Kafka)

  • Publish event:
topic: chunk-created

8. Chunk Metadata Service

  • Chunk Service consumes event
  • Stores chunk metadata:
chunkInfo { fileId, chunkId, chunkUrl }

👉 Design intent

  • Enables parallel download later
  • Avoids scanning S3 during reads



Client → API Gateway

2. CDN Check

  • First check:
    • CDN cache (edge)

👉 If hit → return immediately

👉 If miss → go backend

3. File Metadata Lookup

  • Request goes to FileInfo Service
  • Service fetches:
    • fileInfo DB
    • chunkInfo DB

4. Return Chunk URLs

  • Response contains:
[ {chunkId, chunkUrl}, ... ]

5. Client Parallel Download

  • Frontend:
    • Downloads chunks in parallel
    • Reconstructs file locally

👉 Why client-side merge?

  • Reduces backend load
  • Maximizes bandwidth usage




Detailed Component Design


Fault Tolerance

  • Raw file already in S3 → no data loss
  • Kafka enables retry
  • Chunk processing is idempotent

Scalability

  • Chunking enables:
    • parallel processing
    • parallel download
  • Each service scales independently
  • Faster uploads




Failover / Replication Design

1. Local-region design

Write path

  • Writes go to the primary node
  • The primary replicates data to local replicas
  • The write is considered successful only after:
    • primary write succeeds, and
    • at least one replica acknowledges the write

Read path

  • Reads are served from local replicas whenever possible
  • This reduces read latency and offloads traffic from the primary

Why this design

  • Compared with waiting for all replicas, waiting for one replica ack gives a better balance:
    • better durability than primary-only ack
    • lower latency than full synchronous replication


2. Multi-region design

Replication strategy

  • Use quorum-based replication across regions

For example:

  • total replicas = 5
  • write quorum = 3

Then:

  • upload is considered successful once 3 out of 5 replicas acknowledge the write

Why quorum

  • It allows the system to tolerate node or region failures
  • As long as quorum is preserved, the system can continue serving writes safely

3. Failover strategy

During failover, the system promotes the most up-to-date replica as the new primary, typically the one with the smallest replication lag or the latest committed log position.



Performance Optimization

  • CDN for hot files
  • Chunk-based parallel download

Extensibility

  • Pipeline can easily add:
    • virus scan
    • AI tagging
    • preview generation