System requirements
Functional:
User can post a file
Managing permission
User can download a file
User Authentication
update file
Non-Functional:
Highly Availabilit - User needs to get files and share without any downtime as possible
Low Latency - User should be able to download the files quickly
Capacity estimation
User DAU - 100M
File upload per user per day - 1
File size average - 100MB
100MB * 100m * 365 * 10 - expected storage required for 10 year service
36TB
API design
/api/postfile
Req Body:
{
raw file
UID
}
Res Body :
{
URL for storing file
}
/api/getfile
Req Body:
{
URL for storing file
UID
}
Res Body :
{
raw file
}
/api/grantPermission
Req Body:
{
UID
URL
granted Uid
}
Res Body:
{
status
}
/api/updateonFile
Req Body:
{
UID
URL
Raw File
}
Res Body:
{
status
}
we can skip auth api endpoint but will go over it on HLD
Database design
File data
{
hashedURL (PK: String)
UID (FK: String)
Created By (date)
File URL
}
Permission
{
PermissionID (PK: String)
hashedURL (FK: String)
UID (FK: String)
}
User Data
{
Uid (PK : String)
User name
}
High-level design
User Service can hold User table from RDBMS meta data
Client can move through Load Balancer and rout to multiple service
Object Storage will be connected to file service for accessing file
Request flows
- Post File
- client request go to file service
- file service upload file on object store and save meta data to RDBMS
- also grant permission to UID with hashed URL
- Response with hashed URL to client
- User Login/Logout
- Go to user service and move through auth through RDBMS and user service
- Grant permission
- Go through graph DB and update the permission
- Update file
- it move through file service and check permission from graphDB
- after check update the file based on the check result
- response with status if update successfuly work or not granted
Detailed component design
- We can generate Hashed URL based on storage expectation needed
- Using MD5 and base 62 and take number of several character at the front based on needed storage. (if over 80PB -> get 7 keys to make unique key 62 ^ 7)
- GraphDB for adjancey list for permission check
Trade offs/Tech choices
- If there is more entry to store on metadata, we can horizontaly scale up the DB through noSQL, otherwise, to provide the metadata with strong relations, it is good to go with RDBMS
Failure scenarios/bottlenecks
- Single point of failure for load balancer
- can be solved using passive LB
- If there is more storage on RDBMS -> hard to scale up (or we can cover this through sharding with UID or HashedURL)
Future improvements
- Add cache key value pair
- CDN for file storage
- DB partition
- Analyzer with log DB
- Adding notification service