System requirements


Functional:

  1. User Management:
  2. Create an account with a unique username and a valid email address.
  3. Log in securely with proper authentication mechanisms.
  4. Log out from the system to terminate the session.
  5. File Operations:
  6. Upload files to the user's account.
  7. Download files from the user's account.
  8. Create, move, rename, and delete folders.
  9. Move, rename, and delete files.
  10. Access and manage previous versions of files.
  11. Synchronization:
  12. Automatically synchronize files across multiple devices in real-time.
  13. Ensure changes made on one device are reflected on all other connected devices.
  14. Sharing and Collaboration:
  15. Share files/folders securely with other users.
  16. Collaborate in real-time on shared files.
  17. Set permissions for shared items (view-only, edit, etc.).
  18. File Search:
  19. Search for files/folders based on keywords.
  20. Provide accurate and fast search results.


Non-Functional:

  1. Security:
  2. Implement robust encryption for data transmission and storage.
  3. Regularly update security protocols to protect against emerging threats.
  4. Monitor and log user activities for auditing and security purposes.
  5. Scalability:
  6. Design the system to handle a growing number of users and files.
  7. Scale the infrastructure horizontally to accommodate increased load.
  8. Performance:
  9. Ensure low-latency file uploads and downloads.
  10. Optimize search algorithms for quick and efficient results.
  11. Minimize synchronization delay between devices.
  12. Reliability:
  13. Implement regular backups and data recovery mechanisms.
  14. Provide system availability with minimal downtime for maintenance.
  15. Compatibility:
  16. Support a variety of file types and sizes for uploading and downloading.
  17. Ensure compatibility with popular operating systems and browsers.
  18. Compliance:
  19. Comply with data protection regulations and privacy laws.
  20. Maintain transparency in terms of data usage and storage policies.
  21. Availability:
  22. Design the system with high availability to minimize service downtime.
  23. Implement redundant systems and failover mechanisms to ensure continuous service.



Capacity estimation


The total number of users = 500 million.

Total number of daily active users = 100 million

The average number of files stored by each user = 200

The average size of each file = 1 MB

Total number of active connections per minute = 1 million


Storage Estimations:


Total number of files = 500 million * 200 = 100 billion

Total storage required = 100 billion * 1 MB = 100 PB

Considering 1 server can handle 1000 requests concurrently, we would need 1 Million / 1000 = 1000 servers



API design

User Authentication API:

  • Description: This API handles user authentication, allowing users to securely log in and obtain access tokens.
  • Input: User credentials (username, password).
  • Output: Access token or an error message.

2. File Upload API:

  • Description: Enables users to upload files to their accounts.
  • Input: File data, user authentication token.
  • Output: Confirmation of successful upload or an error message.

3. File Download API:

  • Description: Allows users to download files from their accounts.
  • Input: File identifier, user authentication token.
  • Output: Downloaded file data or an error message.

4. File Management API:

  • Description: Provides functionality to manage files and folders (create, move, rename, delete).
  • Input: File/folder details, user authentication token.
  • Output: Confirmation of the operation or an error message.

5. File Synchronization API:

  • Description: Ensures synchronization of files across multiple devices in real-time.
  • Input: User authentication token, device identifier, file changes.
  • Output: Confirmation of synchronization status or an error message.

6. Sharing and Collaboration API:

  • Description: Facilitates secure sharing of files/folders and collaboration between users.
  • Input: Shared item details, user authentication token.
  • Output: Confirmation of successful sharing or an error message.

7. Version Control API:

  • Description: Manages access to previous versions of files.
  • Input: File identifier, version details, user authentication token.
  • Output: Previous version of the file or an error message.

8. File Search API:

  • Description: Allows users to search for files/folders based on keywords.
  • Input: Search query, user authentication token.
  • Output: List of search results or an empty result set.




Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...






High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...






Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...








Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?