System requirements
Functional:
- User account
- dashboard with a file system layout
- upload a file/folder
- share a link to a file/folder
- a file viewer that will support multiple file formats: ex: jpg, pdf, png, txt, json
- download a folder/file
Non-Functional:
Availability is essential 99.99% is required to access/share files stored in Dropbox
Security is good but not mandatory
Performance is important. We don't want to wait forever for the file to upload
Scalability is major, we need to be able to grow our DB/storage with the endless amounts of files being uploaded.
Capacity estimation
600 million * user data (name, email, payment, billing) 1kb = 600,000,00 * 1,000 = 600 GB
6 million * user data = 60 GB/day
200 million * 1 mb = 200 TB of data daily
API design
/login(email, pass) {
jwttoken
}
/register(email, pass) {
jwttoken
}
/getFolder(folderId, jwttoken) {
folder: {
name: 'folder'
folder: {
name: 'folder'
files: [file1, file2]
}
files: [file1, file2]
}
}
/getFile(fileId, jwttoken) {
name,
type,
size
link
}
/upload(bytes, jwttoken) {
success
}
/dowloadFolder(folderId, jwttoken) {
success
}
/dowloadFile(fileId, jwttoken) {
success
}
This will flip a sharable flag in s3.
/share(filedId, jwtToken) {
success
}
Database design
File DB: S3 for storing files, highly scalable but lacks data analysis tooling.
User DB: Postgres, ACID compliant and has advanced queries for find files by user. This will be shared based on user id for scaling.
File Metadata DB: Postgres contains S3 links to all the files/folders associated with a user. Using the user ID as a foreign key. This will be shared on user id for scaling. It will also use opertunistic locking.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
A users registers with the User Service. This store the user data with the has password in the User DB. When the user logs in we check to see if the hash matches and then return a JWT token with an expiration of 12 hours to the user.
Api gateway will authenticate the user and then sent the request to the load balancer using least connections strategy.
Upon long in the user will be taken to a homepage that contains a file system like Windows or mac. The folder with the root for the user will be loaded with all the file metadata and links to the s3 buckets where the data is returned.
A user can click on a file and either view of download the file. Either way, we have to go to the s3 bucket where the data is contained.
A user can select a folder or file to upload.
Detailed component design
The Upload service will get a HSL stream of data to be stored in S3. The upload service will then push a message to the kafka queue to generate the metadata. This metadata will be stored with a user id.
The metadata service will then read from the kafka stream the s3 objects that need metadata generated for the file.
Trade offs/Tech choices
S3 allows us to scale easily and drop files into a bucket really well. However we can not collect to much analytical data on the files or use any type of querying to get more details.
The upload service is able to upload files quicker since its free from generating the metadata. However files may not display correctly at first if the metadata is not synced.
Failure scenarios/bottlenecks
The file metadata DB will have issues scaling. If there is a group of users that shard to the same DB we can easily overwhelm one of the File Metadata DBs in the cluster.
Future improvements
Only share files with specified users. We can create a sharing DB with userids or emails that are allowed to view the s3 object.