System requirements


Functional:

user can upload file

user can share file

share user can download file from link



Non-Functional:

HA, eventual consistency is ok

Scales to support users




Capacity estimation

Estimate the scale of the system you are going to design...

1 mil users

1 GB/week


1 mil gb/week

~100,000GB/day

1000,000,000,000,000GB/100,000s = 1GB/s


assuming 20% of bandwidth is upload

storage grows at 20,00GB*7 = 140TB/week



API design

Define what APIs are expected from the system...


POST /files/ {

meta: {

id

filesize

chunks: []

name}

url

}


PATCH /files/id

{

status:

}


GET /files/id


GET /files/id/share



Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


Database File entry will look like

FileMeta{

id:

bloburl:

chunks: []

}

Where chunks is a hash of each chunk


High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...


Authentication, LB is handled by the api gateway


Blob storage holds the files

Database contains metadata about the files.

Files are chunked and uploaded. Chunk hashes are stored in the DB.


Shares and access to upload to blob is via preauthorized URLs.



Request flows


The Sever responds to upload requests by the user and provides a preauthorized blob storage URL.

The client chunks the file, and uploads chunks to blob storage. The client provides the metadata to the server.

On completion of all chunk uploads, the filemeta db status is updated by the client to "complete"


On share, a preauth url is generated by the server.



Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


Server is stateless, scales well.

Database is a OLAP partitioned by region, with user sort key. Chosen for its simplicity to scale


Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


Chose preauth url to avoid having all file data processed through the server, saving bandwidth.


Chose blob storage for its scalability and avoiding storing the bits in the db.


Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.


blob storage could become a bottleneck if we support publicly-shared files. We can then utilize CDN for such files.





Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?


Use CDN to improve throughput for users across regions. On share access, the CDN can pull the file from blob storage and the user can access from CDN.

This would have a limited lifetime to avoid costly CDN storage.