Requirements


Functional Requirements:


  • Allow users to upload files to the system.
  • Enable users to download uploaded files.
  • Ensure synchronization of files between local and server storage.



Non-Functional Requirements:


  • List the key non-functional requirements (eg low latency, scalability, reliability, etc.)...
  • latency: < 10s for each chunk, plus the network latency
  • scalability: serve 1M requests per hour
  • reliability: successfully uploaded files cannot lost


API Design

  1. HTTP_PUT upload/<path_to_file>?chunk_offset=0,chunk_size=1024,total_size=131072,token=some_base64_token
  2. HTTP_GET file_info/<path_to_file>?token=some_base64_token returns a json of file info including size, file time, etc
  3. HTTP_GET download/<path_to_file>?chunk_offset=0,chunk_size=1024,token=... returns the corresponding chunk
  4. HTTP_GET changes/<client_device_id>?limit=500,token=... returns a list of new, deleted, and changed files that are newer than the given timestamp.
  5. HTTP_PUT finished/<path_to_file>?token=... signifies a finish of the file upload.



High-Level Design

  1. Auth service gives a token for the session. The token embeds / determines the user's or project's root directory.
  2. An API gateway validates the token and its associated key to the project directory. It also performs rate limiting.
  3. Sync is initiated by the client. It first queries for new changes, and then download the changed and new files or chunks.
  4. Client is responsible to track all upload retuned successfully, and then call finished/.
  5. Internally, files are stored as chunks.
  6. write path 1, data: web tier perform chunking and raid5, shard the chunks and send them to storages
  7. write path 2, metadata: for each chunk, web tier compute basic info like file path, size, time, owner and project, and write to multiple metadata storage service
  8. write path 3, journeling: a journeling service works as a write queue, that can merge events. It logs every successful API call.
  9. Garbage collection: a background service scans metadata, if a file is incomplete for >7 days, it is deleted.
  10. read path 1, metadata: web tier queries one of the metadata service and returns
  11. read path 2, file content: web tier get the contents using the same sharding rules from storages.
  12. read path 3, new changes: journel of a project / root directory can be queried from the journeling service, after a timestamp. The web tier get the journel, filter out irrelevant events, and merge outdated events



Detailed Component Design

The journeling service is the center of the system. It logs every successful API call, and provides an interface to get changes after a timestamp.


To be extra safe, timestamps should be computed at the server, not client. The journeling serivce keeps reading the journels until it sees the the specified client_device_id, and then return everything seen so far. This can be optimized by keeping last_time for each client_device_id separately.


The journels are cached, since multiple queries likely share a lot of results. Metadata is also cached since it is relatively small and used often.


Each service keeps a soft and a hard timeout when calling underlying services. When some calls experience a soft time out, it notifies the observity service which might provision more resources or adjust the rate limiter to prevent overflowing events. The retry mechanism should be carefully tuned to prevent negative feedback loops.



Latency analysis: metadata is small. file contents are sharded, there is also a hard timeout. The journeling service is also sharded, is a write queue, and protected by rate limiting. They all have bounded response time.


Scalability analysis: web tier is load balanced, metadata, blob storage, and journeling are all sharded and do not have inter-dependency to each other. The databases are also sharded. They all scale linearly.


Reliability analysis: the observity component provisions new instances and adjust rate limiters to prevent overflows. metadata are relatively small and are replicated. blob store is replicated, journel db is replicated