Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Global Content Distribution Network with Score: 8/10

by utopia4715

System requirements

Functional:

- User uploads content.

- User downloads content.

Non-Functional:

- Finds and handles duplicated contents

- Scalability

- Fast download

- Upload throughput

- Reliability

- Bursty traffic

- Security

- Eventual Consistency

Capacity estimation

- 1,000 uploads of files per second.

- Each file is avenerage 1MB.

- 1GB of new content every second.

- 86K * 1GB = 86TB

- 86 * 31K = 2666PB in 1 year.

This is huge amount of data. CDN is a cache, so it's OK to evict data in

Least Recently Used manner.

API design

- upload_content(user_id, content, chunk_id) -> returns content id

- download_content(user_id, content_id, chunk_id)

Database design

- High volume of write and read.

- Write Path and Read Path are simply uploading and downloading of data.

Blob Store for content.

Document NoSQL DB (MongoDB) for Metadata.

Redis or Memcached for caching. I lean toward Redis for its ability to persist data to disk, making caching reliable.

High-level design

When content is uploaded, Upload Service decides which server should store this data, based on geo location and load.

Upload Servie updates DNS on this mapping.

Upon download, client gets which server to access from DNS.

Client downloads the content from that server.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

* Duplicate Detection

Duplicate Detection service receives a messages from Message Queue to understand a new content came.

It computes a hash of each chunk in the content.

It compares the hash against Bloom Filter that represents other contents.

If duplicate is detected, take a corrective action, e.g., notify sys admin, notify user, or delete content.

* Security

Security Gateway, which interacts with clients, does:

- Rate Limiting against attacks like Denial of Service.

- Authentication and Authorization using protocols such as OIDC.

- Terminate HTTPS connection.

* Load Balancing

When content is uploaded, Upload Service decides which server should store this data, based on geo location and load.

Upload Servie updates DNS on this mapping.

Upon download, client gets which server to access from DNS.

Client downalods the content from that server.

Trade offs/Tech choices

Choice of the metadata DB:

It needs horizontal scalability, because there are 1000 contents each second. NoSQL DB would be more appropriate than Relational DB from scalability and performance standpoint.
Eventual Consistency is fine (we do not need ACID properties). This is another nod for NoSQL DB.
Secondary indices are important for metadata DB because we will probably want to search by content properties, e.g., geo location, content type, length, timestamp, etc. Document DB supports secondary indices better than LSM based DB.

As a result Document DB would be a best choice.

Failure scenarios/bottlenecks

* Content may be slightly modified, e.g., timing is different, or video format is different.

In this case, hash and Bloom Filter based approach would not work perfectly.

We can use AI (pattern recognition) to look for similarity.

Also, we can allow other users to report if they see similar contents.

* Some server becomes hot

DNS does not handle this well.

Probably add one more level of Load Balancer, after DNS based load balancing, to take care of uneven load.

* Monitor resource situation and response time of all components. Alert sys admins. Auto-scale some components like Blob Store, Cache, and Download Service. All components are horizontally scalable.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?