System requirements


Functional:

User authentication and authorization

File upload and download

Synchronize across multiple devices

File sharing

Version control and history

Folder creation

Search

Offline Access

Storage quota management

Notifications


Non-Functional:

Scalability.

Horizontal scaling for file storage.

Sharding to distribute data across multiple databases

Microservices architecture

Asynchronous processing for file uploads


Available

multi region support

Robust monitoring and alerting

retry mechanism with exponential backoff


Durability

Replication

Data integrity checks


Performance

Chucked file for large file transfers

Asynchronous IO operations



Capacity estimation

Estimate the scale of the system you are going to design...






API design


Authentication POST /api/v1/auth/register POST /api/v1/auth/login POST /api/v1/auth/logout POST /api/v1/auth/refresh-token

  1. File Operations POST /api/v1/files/upload GET /api/v1/files/{fileId} PUT /api/v1/files/{fileId} DELETE /api/v1/files/{fileId} GET /api/v1/files/{fileId}/versions POST /api/v1/files/{fileId}/restore
  2. Folder Operations POST /api/v1/folders GET /api/v1/folders/{folderId} PUT /api/v1/folders/{folderId} DELETE /api/v1/folders/{folderId}
  3. Sharing POST /api/v1/files/{fileId}/share DELETE /api/v1/files/{fileId}/share/{shareId}
  4. Search GET /api/v1/search?q={query}
  5. User GET /api/v1/user/quota GET /api/v1/user/recent-activity
  6. Sync GET /api/v1/sync/changes?since={timestamp} POST /api/v1/sync/upload-batch ```

Authentication: Use OAuth 2.0 with JWT tokens for authentication. Include the token in the Authorization header for all authenticated requests.

Request/Response Format: Use JSON for request and response bodies. Use standard HTTP status codes for responses.




Database design

User

{ id: UUID, email: String, passwordHash: String, name: String, createdAt: Timestamp, updatedAt: Timestamp, storageUsed: Long, storageQuota: Long }

File

{ id: UUID, ownerId: UUID, name: String, type: String, size: Long, path: String, parentFolderId: UUID, createdAt: Timestamp, updatedAt: Timestamp, lastModifiedBy: UUID, isDeleted: Boolean, deletedAt: Timestamp }

FileVersion

{ id: UUID, fileId: UUID, versionNumber: Integer, size: Long, createdAt: Timestamp, createdBy: UUID }

Folder

{ id: UUID, ownerId: UUID, name: String, parentFolderId: UUID, path: String, createdAt: Timestamp, updatedAt: Timestamp }

Share

{ id: UUID, fileId: UUID, sharedBy: UUID, sharedWith: UUID, permissions: String, createdAt: Timestamp, expiresAt: Timestamp }

Storage: Use a combination of SQL and NoSQL databases. - SQL (e.g., PostgreSQL): For user data, file metadata, and relationships - NoSQL (e.g., Cassandra): For file versions and sync logs - Object Storage (e.g., Amazon S3): For actual file content

Data Transportation: Use a message queue (e.g., Apache Kafka) for asynchronous operations like file uploads, downloads, and sync events.

Encryption: Use AES-256 for encrypting files at rest and TLS 1.3 for data in transit.



High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...






Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?