System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
1) User should be able to upload file from any device.
2) User should be able to download file from any device.
3) User should be able to share files with other users and view the files shared with them.
4) User shhould be able to autosync across the devices
Non-Functional:
List non-functional requirements for the system...
1) System needs to be available(availability > consistency) as immediate/strong consistency is not the requirement here.
2) System should support large file uplods such as 50 GB
3) System should be secure as possible in terms of file sharing and recovering corrupt or lost files.
4) Upload/Download/Sync latency should be as low as possible.
Capacity estimation
Estimate the scale of the system you are going to design...
API design
Define what APIs are expected from the system...
Initial API Design(Subject to modified moving forward) -
POST /v1/files/upload
Request :
{
File,
FileMetadata
}
GET /v1/files/download/{file_id} : File
POST /v1/files/share/{file_id}
Request:
{
User[]
}
Fetching changes in file
GET /v1/files/changes/{file_id}: FileMetadata[]
Improved API Design
POST /v1/files/generate_presigned_url : PresignedURL
Request:
{
FileMetadata
}
this will request presigned url from s3 server, and will mark the status of filemetadata as UPLOAD_IN_PROGRESS.
Response:
S3 will return the presigned url to the upload client
PUT /v1/files/upload/{pre_signed_url} :
Request:
{
File
}
Client will upload the file at the presigned_url.
S3 will notify the MetadataDatabase and mark the status of FileMatadata as uploaded and will mention the S3 URL
GET /v1/files/download/{file_id} : File
The request will go to CDN, if file is present, the response with File will be returned else the request will be redirected to file server via edge computing
POST /v1/files/share/{file_id}
Request:
{
User[]
}
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
3 core entities
-File
-user
-FileMetadata
FileMetadata -
file_id
size in bytes
file_name
uploaded_at
uploaded_by(user)
mime_type
status
s3_url
SharedFiles -
user_id
file_id
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
1) UploadClient -> This client is responsible for calling upload api to uplload file on the backend
2 DownloadClient -> This client is responsible for calling download api to download file from the backend
3) API GAteway and LOad Balancer -> Load balancer like AWS Elastic Loadbalancer is responsible for routing request to appropriate healthy API gateway.
API Gateway is responsible for auth, rate limiting and routing request to file server instance.
4) File Server -> Horizontally scalable file server responsible for uploading file/downloading/synching .
5) FileMEtadata DB -> This database will store the Metdata related to uploaded file as described in the entities above.
6) BlobStorage DB -> This database like AWS S3 and Google Cloud Storage where the actual file will be stored.
7) CDN -> Content Delivery Newtork like AWS Cloudfront which will be responsible for storing/caching more recently fecthed files so that they can be served to users with low geographical latency.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Upload, Download, Share Flows
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?