Requirements


Functional Requirements:


  • Allow users to submit text posts, links, images, videos and GIFs to subreddits.
  • Enable users to upvote or downvote posts and comments.
  • Support threaded discussions with nested replies indented hierarchically.
  • Allow users to create new subreddits based on different topics.



Non-Functional Requirements:


  • high avalibility - if reddit goes down user's can't use the platform
  • low latency - feed must load under 100 ms, media is served from nearest cdn edge node
  • scalibility - must support hundreds of millions of uploads and reads
  • eventual consistency - its ok if things take a second to be seen
  • durablity - posted videos and photos must be secure and never lost


Capacity Estimation

Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...




API Design

Define the APIs expected from the system. This is your chance to analyze and define the read and write paths so that you can come up with the high-level design...


POST /upload

DELETE /posts/:post_id

POST /posts/:post_id/upvote

POST /posts/:post_id/downvote

DELETE /posts/:post_id/upvote

DELETE /posts/:post_id/downvote

POST /subreddit

POST /subreddit/follow

GET /feed

GET /notifications

DELETE /subredit

DELETE /follower



High-Level Design

Describe the overall system architecture. Identify the main components needed to solve the problem end-to-end. Use the diagramming tool to create a block diagram.


The client first hits the rate limiter and api gateawy which will handle auth. The load balancer will route the 4 different serveices including the search and get feed server, upvote and downvote server, notification server, and follow and create subreddit server. The down/upvote server and the follow and create subreddit server will be attatched to a kafka queue to asynchronously process the requests which is followed by a worker server which will process these requests one at a time. This connects to a database that stores the metadata. As for posting images and videos, that will be taken care of by the post server which will be stored in the s3 bucket and cached in a cdn. The search and get food server pulls from a cache and if it misses it goes to the database to get the information.



Database Design

Define the data model. Identify the main entities, their attributes, and relationships. Consider the choice of database type (SQL vs NoSQL) and justify your decision based on access patterns...




Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


The post server is the main point for all media being posted. It will take care of any content being posted including any posts with images and videos. It will put videos and images into the s3 bucket and when a user gets a feed it will pull form the cdn first to store all heavy files. The upvote downvote and follow and create subreddit are connected to a kafka qeueue which handles the requests asynchrnously so it does not get backed up. The worker server after the kafka is in charge of processing these requests one by one. The search and feed server is in charge of populating the user's feed by fetching first by the cache. The cache makes sure to only store the most popular feed details and if there is a miss it will fetch from the database. For the database we will use a sql table like structure that stores the metadata for each post and the infromation for each user including the subreddits they follow, created, downvote and upvote. The secondary database is for redudancy and for in case we lost informaiton from the first database