Requirements
Functional Requirements:
- Allow users to upload files to the system.
- Enable users to download uploaded files.
- Ensure synchronization of files between local and server storage.
- Batch upload & download
Non-Functional Requirements:
- app availability 99.99%, 52min downtime in a year
- file/data eventual consistency
- scalability, when a server fails, the failover
- security, only user with permission can upload/view. encryption for file object
- durability, needs replication for each component.
world 8B, 10% users from pop = 0.8B
system DAU will be 10% of users = 80M
R:W = 10:1
W DAU
API Design
1x1
POST /v1/file {file info}-> return {status, error code, location url}
batch
POST /v1/files {files info}-> return {status, error code, location url}
download a file
GET /v1/file/id
GET /v1/files/id={id1, id2,..}
get file info, to finalize upload
Get /v1/fileinfo/id
all above API using JWT as user security token to access endpoints
High-Level Design
fileInfo: fileId, name, author, chunkIds:[], updated, created
chunkInfo: fileId, chunkId, chunkUrl
Client → API Gateway
- Client calls:
upload(fileId)
- API Gateway responsibilities:
- Route request to available upload servers
- Use least-connections load balancing
- Handle failover when a node goes down
👉 Design intent
- Prevent hotspot
- Ensure high availability at entry point
2. Raw Object Storage (S3)
- Upload server:
- Directly streams raw file → S3 (raw bucket)
👉 Why first write to S3?
- Avoid coupling with metadata or processing
- Durable storage immediately (no data loss)
3. Metadata Registration (FileInfo Service)
- Client sends:
POST upload(fileInfo)
- FileInfo Service:
- Stores metadata into DB
fileInfo {
fileId,
name,
author,
created,
updated
}
👉 Design intent
- Metadata and file storage are decoupled
- Enables independent scaling
4. Async Processing via Kafka
- System publishes event:
topic: file-uploaded
👉 This triggers downstream processing pipeline
5. Object Processing Pipeline (Async Consumers)
A dedicated Object Handler Service consumes Kafka events:
Pipeline stages:
Raw S3 Object
↓
Split into chunks
↓
Fraud Detection
↓
Compression Service
👉 Why pipeline?
- Each stage is independently scalable
- Failures are isolated (retry per stage)
6. Chunk Storage
- After processing:
- Each chunk uploaded to Chunk S3
👉 Guarantees:
- All chunks successfully uploaded
- Idempotent writes (important for retries)
7. Chunk Notification (Kafka)
- Publish event:
topic: chunk-created
8. Chunk Metadata Service
- Chunk Service consumes event
- Stores chunk metadata:
chunkInfo {
fileId,
chunkId,
chunkUrl
}
👉 Design intent
- Enables parallel download later
- Avoids scanning S3 during reads
Client → API Gateway
2. CDN Check
- First check:
- CDN cache (edge)
👉 If hit → return immediately
👉 If miss → go backend
3. File Metadata Lookup
- Request goes to FileInfo Service
- Service fetches:
- fileInfo DB
- chunkInfo DB
4. Return Chunk URLs
- Response contains:
[
{chunkId, chunkUrl},
...
]
5. Client Parallel Download
- Frontend:
- Downloads chunks in parallel
- Reconstructs file locally
👉 Why client-side merge?
- Reduces backend load
- Maximizes bandwidth usage
Detailed Component Design
Fault Tolerance
- Raw file already in S3 → no data loss
- Kafka enables retry
- Chunk processing is idempotent
Scalability
- Chunking enables:
- parallel processing
- parallel download
- Each service scales independently
- Faster uploads
Failover / Replication Design
1. Local-region design
Write path
- Writes go to the primary node
- The primary replicates data to local replicas
- The write is considered successful only after:
- primary write succeeds, and
- at least one replica acknowledges the write
Read path
- Reads are served from local replicas whenever possible
- This reduces read latency and offloads traffic from the primary
Why this design
- Compared with waiting for all replicas, waiting for one replica ack gives a better balance:
- better durability than primary-only ack
- lower latency than full synchronous replication
2. Multi-region design
Replication strategy
- Use quorum-based replication across regions
For example:
- total replicas = 5
- write quorum = 3
Then:
- upload is considered successful once 3 out of 5 replicas acknowledge the write
Why quorum
- It allows the system to tolerate node or region failures
- As long as quorum is preserved, the system can continue serving writes safely
3. Failover strategy
During failover, the system promotes the most up-to-date replica as the new primary, typically the one with the smallest replication lag or the latest committed log position.
Performance Optimization
- CDN for hot files
- Chunk-based parallel download
Extensibility
- Pipeline can easily add:
- virus scan
- AI tagging
- preview generation