System requirements


Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

  • Users can post either binary or string contents to the service and get the unique URL back.
  • Users can share the content with others by using the unique URL.
  • Users can assign tags and TTL to the content.Other can only read this.
  • Users can change the saved content.



Non-Functional:

List non-functional requirements for the system...

  • The service should be high-availably and high-reliable.
  • The service should be scalable to load and can process high user peaks.
  • If users post their content, it may be visible in some 100 milliseconds.
  • Only the registered users may post contents, but everyone can read it.





Capacity estimation

Estimate the scale of the system you are going to design...

Let's suppose the users may post 1 Mbytes and DAU is 1 Million users.

The service would be read-heavy, let's count that every user reads the post five times a day and posts new content daily. So it needs to store 730 Terabytes every 1 year ( it counts the data duplication) and 3,6 Petabytes for 5 years.

It needs 5 Terabytes of bandwidth to guarantee the read for 1 Million users.



API design

Define what APIs are expected from the system...


post(apiKey, userId, content,contentLength, tags,TTL) posts the content to the backend and returns the unique key and URL, passing of the content and its length and optional tags and TTL of contents. The use of apiKey guarantees the prevention of abuse of the service.


getPost(apiKey, URL, tags) requests the contact by URL and tags and returns it. The use of apiKey guarantees the prevention of abuse of the service.


deletePost(apiKey,userId, URL) delete the posted content from the service. The use of apiKey guarantees the prevention of abuse of the service.





Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


The service would use the eventual consistency to post the content. With such consistency, we may use AWS Dynamo DB.

We would have two tables User, Post, RateLimit


The table User:

id varchar (100 bytes)

email varchar (200 bytes)

createdAt Date (8 bytes)

blocked Boolean (1 byte)


The table Post:

id varchar (100 bytes)

userId varchar (100 bytes)

tags varchar (1000 bytes)

URL varchar (1000 bytes)

content array (1 Mega bytes)

content length 1 integer (8 bytes)

The table RateLimit:

apiKey varchar (1000 bytes)

timestamp date (8 bytes)


The table Post ties with the table User by userId. The table RateLimit contains the requests timestamp to build rate limit functionality.





High-level design


API Gateway provides DDoS protection, TLS termination, and routing requests to the right service nodes.

The main service is the Post service which caters the requests to publish or read posts. New posts are added to Kafka, then the Post service reads and processes it. The most requested posts the Post service stores in the cache. The Post service posts the post to AWS Dynamo DB.

The URL generation service generates the unique URL for new posts and stores them in the cache, and if the Post service needs to get a new unique URL, the URL generation service reads it from the cache.

CDN allows us to locate the posts closer to customers.

Caches may use LRU or LFU eviction policies.


Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


The user sends a request to publish a new post and the API Gateway sends it to Kafka. The Post service reads this requests, and ask the URL generation service for a new URL and sends the ready data to AWS Dynamo DB.

When the user wants to read the posts, the API Gateway checks if it's to the closest CDN, if no, it puts the request to Kafka topic. The Post service reads this request and checks if it may be in the cache,if so the post would be returned to the user, otherwise, the Post service finds it in AWS Dynamo DB, stores in the cache, and returns the post to the user.






Detailed component design

Performance and scalability of the Post service is extremely important for this system. As such, we employ two levels of caching. Requests will naturally have locality of access, so caching will be effective. The Post service is stateless and we may use a few instances of this service. Also, we may use the Kubernetes to scale this service efficiently and make it fault-tolerant, because it's a critical part of the system. Kubernetes cares about service availability, and it's health, by ping the configured service endpoint. The Post service uses the cache, we would use Redis to store most requested posts, and we are going to use LRU to evict the stale posts. The Post service posts the posts to AWS Dynamo, it's a scalable and fault-tolerant key-value storage managed by AWS. The Post service reads the request from Kafka cluster. We would use Kafka partition replications to prevent losing post requests. When new post requests are coming up, the Post service asks the URL generation for a new unique URL.

To search posts by tags faster, we would build an index by tags in AWS Dynamo.

The URL generation service manages the unique URL. The URL generation service creates free URL and stores them in the cache. We would use the Redis as the cache. The cache has the configured capacity, so we don't care about data eviction. If Redis falls down, it's ok, if it loses the data, the URL generation service may generate new URLs.


At the closest location to the clients, we will have CDN storing posts and it decreases latency when the customer reads the post content.


Partitioning


The Post service and the URL generation service should be partitioned for improved scalability.

If we choose URL as the partition key for the Post service, it gives us even data distribution.

The URL generation service uses the cache with a partition and the partition key is URL.

AWS Dynamo manages the data partitions and we don't need any involvement.

The caches may use the URL as the partition key as well.




Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?