System requirements
Functional:
- User can load/upload files.
- Modify content.
- Share files with permissions (Read-only/Write-only etc.).
Non-Functional:
- Availability
- Durability.
- Scalability.
- Consistency.
- Security.
- Low latency.
Capacity estimation
. Blob Storage (File Content)
- Average file size: 10 MB
- Uploads per user per day: 10
- Active users: 100,000
- Daily upload volume: 10 TB/day
- Annual storage needed: ~3.65 PB/year
Metadata Database
a. Files table:
- Rows/year: ~365 million
- Row size: ~300 bytes
- Total: ~110 GB
b.File versions table:
- Rows/year: ~550 million (assuming 1.5 versions per file)
- Row size: ~400 bytes
- Total: ~220 GB
c. Other tables
- Estimated size: ~100–150 GB
API design
File APIs:
POST /files – Create a new file upload (get upload URL)
PUT /uploads/{upload_id}/chunks/{part_number} – Upload a chunk
POST /uploads/{upload_id}/complete – Finalize chunked upload
GET /files – List user’s files
GET /files/{file_id}/download – Download a file
DELETE /files/{file_id} – Delete a file
Versioning:
GET /files/{file_id}/versions – List versions of a file
GET /files/{file_id}/versions/{version_id} – Get a specific version
Sync & Devices:
POST /devices – Register a new device
GET /sync – Get list of files needing sync
POST /sync/ack – Acknowledge synced files
Jobs:
GET /jobs/{job_id} – Get job (e.g. virus scan) status
Database design
Database tables:
Table users {
user_id <-key
}
Table files {
file_id <- key
user_id <- reference to user
name
type
size
version
created_at
updated_at
}
Table file_version {
version_id <- key
version
file_id <- refernce to files
blob_path
status
checksum
uploaded_at
user_id <- reference to users
}
Table chunks{
chunk_id <-- key
version_id <- reference to vestions
number
blob_path
created_at
size
}
High-level design
- CDN
- API Gateway - load balancer, router, auth, rate limiter and ssl termination.
- File uploader - writes metadata of file to Metadata DB and returns URL to chunk uploader.
- Chunk uploader - uploads data in blob store.
- Blob store - stores user content (AWS S3).
- Metadata - stores file metadata (Postgres).
- Sync service - synchronizes data on the server and on the client device.
- Async job - notify all devices.
Request flows
Entry Point
- Client interacts with the system.
- CDN (Content Delivery Network) is used for caching and accelerating delivery of static assets or downloads.
- API Gateway serves as the main entry point, routing requests to appropriate services.
Upload Flow
- Client initiates an upload request via the API Gateway.
- API Gateway routes the request to the File Upload Service.
- File Upload Service:
- Registers the upload and generates an uploadID.
- Calculates chunking strategy based on file size.
- Returns chunk upload URLs (or endpoint) to the client.
- Client uploads file chunks directly to the Chunk Uploader.
- Chunk Uploader:
- Stores chunks in the Blob Store.
- Records upload metadata (e.g., file name, size, parts) in the Metadata DB.
- Once upload is complete:
- Metadata and event notifications are sent to the Sync Service and/or published to the Message Queue.
Download Flow
- Client sends a download request via the API Gateway.
- API Gateway routes the request to the File Download Service.
- File Download Service:
- Fetches file metadata from the Metadata DB.
- Coordinates with Chunk Downloader to retrieve individual file chunks from the Blob Store.
- Chunks are reassembled and streamed back to the client.
Syncing & Background Processing
- Sync Service maintains consistency across devices, triggering updates or syncing actions as needed when files change.
- Async Jobs perform background processing such as:
- Virus scanning
- Preview/thumbnail generation
- File indexing or OCR
- These jobs are triggered via events published to the Message Queue, ensuring non-blocking and scalable task execution.
Detailed component design
- API Gateway – Routes traffic, handles SSL, rate limiting.
- Auth Service – Verifies users/devices via JWT.
- File Upload Service – Manages file creation, versions, metadata.
- Chunk Uploader – Handles actual file uploads to blob storage.
- Blob Store (S3) – Stores raw file data (blobs).
- Metadata DB – Stores file info, versions, sync state, jobs.
- Sync Service – Tracks device sync state, lists deltas.
- Message Queue (Kafka/SQS) – Handles async jobs (scan, sync).
- Worker Services – Process background tasks like virus scanning.
- Device Manager (optional) – Manages user devices.
- Notification Service (optional) – Pushes updates to devices.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
1. Delta Sync (Block-Level)
- Sync only changed parts of large files (e.g., using Rsync or binary diff).
- Greatly reduces upload/download costs.
2. End-to-End Encryption
- Encrypt files on client side before upload.
- Improves privacy even from backend access.
3. Content Deduplication
- Avoid storing duplicate file content using hash-based checks (e.g. SHA-256).
- Saves storage and bandwidth.
4. Cold Storage Tiering
- Move old versions to cheaper, slower storage (e.g., S3 Glacier).
- Reduces cost for long-term retention.
5. Preview & Thumbnail Generation
- Auto-generate image previews, PDF pages, video snapshots.
- Improves UX on web/mobile clients.
6. Real-Time Sync with WebSockets
- Use WebSockets or gRPC streams for instant device updates instead of polling.
7. Multi-Region Sync Support
- Replicate blob data and metadata across regions for better latency and availability.
8. User & Team Sharing
- Add sharing permissions, folder collaboration, and public links.
9. Audit Logging
- Log user activity: uploads, downloads, deletes, syncs.
- Useful for enterprise use cases.
10. Admin Dashboard & Analytics
- File stats, usage trends, sync failures, storage consumption.