System requirements


Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

  • Users can post either binary or string contents to the service and get the unique URL back.
  • Users can share the content with others by using the unique URL.
  • Users can assign tags and TTL to the content.Other can only read this.
  • Users can change the saved content.



Non-Functional:

List non-functional requirements for the system...

  • The service should be high-availably and high-reliable.
  • The service should be scalable to load and can process high user peaks.
  • If users post their content, it may be visible in some 100 milliseconds.
  • Only the registered users may post contents, but everyone can read it.





Capacity estimation

Estimate the scale of the system you are going to design...

Let's suppose the users may post 1 Mbytes and DAU is 1 Million users.

The service would be read-heavy, let's count that every user reads the post five times a day and posts new content daily. So it needs to store 730 Terabytes every 1 year ( it counts the data duplication) and 3,6 Petabytes for 5 years.

It needs 5 Terabytes of bandwidth to guarantee the read for 1 Million users.



API design

Define what APIs are expected from the system...


post(apiKey, userId, content,contentLength, tags,TTL) posts the content to the backend and returns the unique key and URL, passing of the content and its length and optional tags and TTL of contents. The use of apiKey guarantees the prevention of abuse of the service.


getPost(apiKey, URL, tags) requests the contact by URL and tags and returns it. The use of apiKey guarantees the prevention of abuse of the service.


deletePost(apiKey,userId, URL) delete the posted content from the service. The use of apiKey guarantees the prevention of abuse of the service.





Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


The service would use the eventual consistency to post the content. With such consistency, we may use AWS Dynamo DB.

We would have two tables User, Post, RateLimit


The table User:

id varchar (100 bytes)

email varchar (200 bytes)

createdAt Date (8 bytes)

blocked Boolean (1 byte)


The table Post:

id varchar (100 bytes)

userId varchar (100 bytes)

tags varchar (1000 bytes)

URL varchar (1000 bytes)

content array (1 Mega bytes)

content length 1 integer (8 bytes)

The table RateLimit:

apiKey varchar (1000 bytes)

timestamp date (8 bytes)


The table Post ties with the table User by userId. The table RateLimit contains the requests timestamp to build rate limit functionality.





High-level design


API Gateway provides DDoS protection, TLS termination, and routing requests to the right service nodes.

The main service is the Post service which caters the requests to publish or read posts. New posts are added to Kafka, then the Post service reads and processes it. The most requested posts the Post service stores in the cache. The Post service posts the post to AWS Dynamo DB.

The URL generation service generates the unique URL for new posts and stores them in the cache, and if the Post service needs to get a new unique URL, the URL generation service reads it from the cache.

CDN allows us to locate the posts closer to customers.

Caches may use LRU or LFU eviction policies.


Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


The user sends a request to publish a new post and the API Gateway sends it to Kafka. The Post service reads this requests, and ask the URL generation service for a new URL and sends the ready data to AWS Dynamo DB.

When the user wants to read the posts, the API Gateway checks if it's to the closest CDN, if no, it puts the request to Kafka topic. The Post service reads this request and checks if it may be in the cache,if so the post would be returned to the user, otherwise, the Post service finds it in AWS Dynamo DB, stores in the cache, and returns the post to the user.






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?