System requirements


Functional:

User can post a file

Managing permission

User can download a file

User Authentication

update file



Non-Functional:

Highly Availabilit - User needs to get files and share without any downtime as possible

Low Latency - User should be able to download the files quickly







Capacity estimation

User DAU - 100M

File upload per user per day - 1

File size average - 100MB

100MB * 100m * 365 * 10 - expected storage required for 10 year service

36TB





API design

/api/postfile

Req Body:

{

raw file

UID

}

Res Body :

{

URL for storing file

}


/api/getfile

Req Body:

{

URL for storing file

UID

}

Res Body :

{

raw file

}



/api/grantPermission

Req Body:

{

UID

URL

granted Uid

}

Res Body:

{

status

}


/api/updateonFile

Req Body:

{

UID

URL

Raw File

}

Res Body:

{

status

}


we can skip auth api endpoint but will go over it on HLD





Database design

File data

{

hashedURL (PK: String)

UID (FK: String)

Created By (date)

File URL

}


Permission

{

PermissionID (PK: String)

hashedURL (FK: String)

UID (FK: String)

}


User Data

{

Uid (PK : String)

User name

}





High-level design



User Service can hold User table from RDBMS meta data

Client can move through Load Balancer and rout to multiple service

Object Storage will be connected to file service for accessing file




Request flows


  • Post File
    • client request go to file service
    • file service upload file on object store and save meta data to RDBMS
    • also grant permission to UID with hashed URL
    • Response with hashed URL to client
  • User Login/Logout
    • Go to user service and move through auth through RDBMS and user service
  • Grant permission
    • Go through graph DB and update the permission
  • Update file
    • it move through file service and check permission from graphDB
    • after check update the file based on the check result
    • response with status if update successfuly work or not granted





Detailed component design

  • We can generate Hashed URL based on storage expectation needed
  • Using MD5 and base 62 and take number of several character at the front based on needed storage. (if over 80PB -> get 7 keys to make unique key 62 ^ 7)
  • GraphDB for adjancey list for permission check






Trade offs/Tech choices


  • If there is more entry to store on metadata, we can horizontaly scale up the DB through noSQL, otherwise, to provide the metadata with strong relations, it is good to go with RDBMS





Failure scenarios/bottlenecks

  • Single point of failure for load balancer
  • can be solved using passive LB
  • If there is more storage on RDBMS -> hard to scale up (or we can cover this through sharding with UID or HashedURL)






Future improvements

  • Add cache key value pair
  • CDN for file storage
  • DB partition
  • Analyzer with log DB
  • Adding notification service