Requirements


Functional Requirements:


  • Allow users to upload files to the system.
  • Enable users to download uploaded files.
  • Ensure synchronization of files between local and server storage.
  • Batch upload & download



Non-Functional Requirements:


  • app availability 99.99%, 52min downtime in a year
  • file/data eventual consistency
  • scalability, when a server fails, the failover
  • security, only user with permission can upload/view. encryption for file object
  • durability, needs replication for each component.


world 8B, 10% users from pop = 0.8B

system DAU will be 10% of users = 80M

R:W = 10:1

W DAU



Durability

  • Once upload returns success, the raw file must already be persisted in durable object storage.
  • Metadata must survive single-node or single-AZ failures.
  • Chunk processing must be retryable without data loss.
  • The system should tolerate consumer crash or retry under at-least-once delivery.


All APIs require JWT-based authentication. The token is validated by the API Gateway before requests are routed to backend services.

In addition to authentication, the system enforces authorization checks to ensure that users can only upload, download, or read metadata for files they own or are explicitly allowed to access.


I would separate security into authentication, authorization, and storage protection. JWT proves user identity, but the backend still needs to verify ownership before returning file metadata or chunk URLs. All traffic should use TLS, objects should stay in private buckets, and downloads should use signed URLs or authenticated CDN access. I’d also add malware scanning and audit logging.



API Design

1x1

POST /v1/file {file info}-> return {status, error code, location url}


batch

POST /v1/files {files info}-> return {status, error code, location url}


download a file

GET /v1/file/id

GET /v1/files/id={id1, id2,..}


Finalize upload

POST /v1/files/{fileId}/finalize

{

"checksum": "sha256-xxx",

"size": 1048576

} request example,

return example: {

"fileId": "f123",

"status": "processing"

}

  • Called after the client finishes uploading the raw object
  • Verifies object existence and upload completeness
  • Updates file status from uploading to processing
  • Publishes a message to Kafka for asynchronous chunk processing

Get file info

GET /v1/files/{fileId}/metadata

  • Returns file metadata and current processing status



all above API using JWT as user security token to access endpoints




High-Level Design

fileInfo: fileId, name, author, chunkIds:[], updated, created


chunkInfo: fileId, chunkId, chunkUrl


Client → API Gateway

  • Client calls:
upload(fileId)
  • API Gateway responsibilities:
    • Route request to available upload servers
    • Use least-connections load balancing
    • Handle failover when a node goes down

👉 Design intent

  • Prevent hotspot
  • Ensure high availability at entry point

2. Raw Object Storage (S3)

  • Upload server:
    • Directly streams raw file → S3 (raw bucket)

👉 Why first write to S3?

  • Avoid coupling with metadata or processing
  • Durable storage immediately (no data loss)

3. Metadata Registration (FileInfo Service)

  • Client sends:
POST upload(fileInfo)
  • FileInfo Service:
    • Stores metadata into DB
fileInfo { fileId, name, author, created, updated }

👉 Design intent

  • Metadata and file storage are decoupled
  • Enables independent scaling

4. Async Processing via Kafka

  • System publishes event:
topic: file-uploaded

👉 This triggers downstream processing pipeline

5. Object Processing Pipeline (Async Consumers)

A dedicated Object Handler Service consumes Kafka events:

Pipeline stages:

Raw S3 Object ↓ Split into chunks ↓ Fraud Detection ↓ Compression Service

👉 Why pipeline?

  • Each stage is independently scalable
  • Failures are isolated (retry per stage)

6. Chunk Storage

  • After processing:
    • Each chunk uploaded to Chunk S3

👉 Guarantees:

  • All chunks successfully uploaded
  • Idempotent writes (important for retries)

7. Chunk Notification (Kafka)

  • Publish event:
topic: chunk-created

8. Chunk Metadata Service

  • Chunk Service consumes event
  • Stores chunk metadata:
chunkInfo { fileId, chunkId, chunkUrl }

👉 Design intent

  • Enables parallel download later
  • Avoids scanning S3 during reads



Client → API Gateway

2. CDN Check

  • First check:
    • CDN cache (edge)

👉 If hit → return immediately

👉 If miss → go backend

3. File Metadata Lookup

  • Request goes to FileInfo Service
  • Service fetches:
    • fileInfo DB
    • chunkInfo DB

4. Return Chunk URLs

  • Response contains:
[ {chunkId, chunkUrl}, ... ]

5. Client Parallel Download

  • Frontend:
    • Downloads chunks in parallel
    • Reconstructs file locally

👉 Why client-side merge?

  • Reduces backend load
  • Maximizes bandwidth usage




Detailed Component Design


Fault Tolerance

  • Raw file already in S3 → no data loss
  • Kafka enables retry
  • Chunk processing is idempotent

Scalability

  • Chunking enables:
    • parallel processing
    • parallel download
  • Each service scales independently
  • Faster uploads




Failover / Replication Design

1. Local-region design

Write path

  • Writes go to the primary node
  • The primary replicates data to local replicas
  • The write is considered successful only after:
    • primary write succeeds, and
    • at least one replica acknowledges the write

Read path

  • Reads are served from local replicas whenever possible
  • This reduces read latency and offloads traffic from the primary

Why this design

  • Compared with waiting for all replicas, waiting for one replica ack gives a better balance:
    • better durability than primary-only ack
    • lower latency than full synchronous replication


2. Multi-region design

Replication strategy

  • Use quorum-based replication across regions

For example:

  • total replicas = 5
  • write quorum = 3

Then:

  • upload is considered successful once 3 out of 5 replicas acknowledge the write

Why quorum

  • It allows the system to tolerate node or region failures
  • As long as quorum is preserved, the system can continue serving writes safely

3. Failover strategy

During failover, the system promotes the most up-to-date replica as the new primary, typically the one with the smallest replication lag or the latest committed log position.



Performance Optimization

  • CDN for hot files
  • Chunk-based parallel download

Extensibility

  • Pipeline can easily add:
    • virus scan
    • AI tagging
    • preview generation