System requirements
Functional:
- User can load/upload files.
- Modify content.
- Share files with permissions (Read-only/Write-only etc.).
Non-Functional:
- Availability
- Durability.
- Scalability.
- Consistency.
- Security.
- Low latency.
Capacity estimation
. Blob Storage (File Content)
- Average file size: 10 MB
- Uploads per user per day: 10
- Active users: 100,000
- Daily upload volume: 10 TB/day
- Annual storage needed: ~3.65 PB/year
Metadata Database
a. Files table:
- Rows/year: ~365 million
- Row size: ~300 bytes
- Total: ~110 GB
b.File versions table:
- Rows/year: ~550 million (assuming 1.5 versions per file)
- Row size: ~400 bytes
- Total: ~220 GB
c. Other tables
- Estimated size: ~100–150 GB
API design
File APIs:
POST /files – Create a new file upload (get upload URL)
PUT /uploads/{upload_id}/chunks/{part_number} – Upload a chunk
POST /uploads/{upload_id}/complete – Finalize chunked upload
GET /files – List user’s files
GET /files/{file_id}/download – Download a file
DELETE /files/{file_id} – Delete a file
Versioning:
GET /files/{file_id}/versions – List versions of a file
GET /files/{file_id}/versions/{version_id} – Get a specific version
Sync & Devices:
POST /devices – Register a new device
GET /sync – Get list of files needing sync
POST /sync/ack – Acknowledge synced files
Jobs:
GET /jobs/{job_id} – Get job (e.g. virus scan) status
Database design
Database tables:
Table users {
user_id <-key
}
Table files {
file_id <- key
user_id <- reference to user
name
type
size
version
created_at
updated_at
}
Table file_version {
version_id <- key
version
file_id <- refernce to files
blob_path
status
checksum
uploaded_at
user_id <- reference to users
}
Table chunks{
chunk_id <-- key
version_id <- reference to vestions
number
blob_path
created_at
size
}
High-level design
- CDN
- API Gateway - load balancer, router, auth, rate limiter and ssl termination.
- File uploader - writes metadata of file to Metadata DB and returns URL to chunk uploader.
- Chunk uploader - uploads data in blob store.
- Blob store - stores user content (AWS S3).
- Metadata - stores file metadata (Postgres).
- Sync service - synchronizes data on the server and on the client device.
- Async job - notify all devices.
Request flows
Entry Point
- Client interacts with the system.
- CDN (Content Delivery Network) is used for caching and accelerating delivery of static assets or downloads.
- API Gateway serves as the main entry point, routing requests to appropriate services.
Upload Flow
- Client initiates an upload request via the API Gateway.
- API Gateway routes the request to the File Upload Service.
- File Upload Service:
- Registers the upload and generates an uploadID.
- Calculates chunking strategy based on file size.
- Returns chunk upload URLs (or endpoint) to the client.
- Client uploads file chunks directly to the Chunk Uploader.
- Chunk Uploader:
- Stores chunks in the Blob Store.
- Records upload metadata (e.g., file name, size, parts) in the Metadata DB.
- Once upload is complete:
- Metadata and event notifications are sent to the Sync Service and/or published to the Message Queue.
Download Flow
- Client sends a download request via the API Gateway.
- API Gateway routes the request to the File Download Service.
- File Download Service:
- Fetches file metadata from the Metadata DB.
- Coordinates with Chunk Downloader to retrieve individual file chunks from the Blob Store.
- Chunks are reassembled and streamed back to the client.
Syncing & Background Processing
- Sync Service maintains consistency across devices, triggering updates or syncing actions as needed when files change.
- Async Jobs perform background processing such as:
- Virus scanning
- Preview/thumbnail generation
- File indexing or OCR
- These jobs are triggered via events published to the Message Queue, ensuring non-blocking and scalable task execution.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?