Requirements


Functional Requirements:


  • Allow users to tweet messages up to 140 characters.
  • Enable users to follow other users.
  • Allow users to like tweets from other users.
  • Display tweets from followed users in the home feed.
  • Show top K popular tweets in the home feed based on likes and followers.



Non-Functional Requirements:


  • Can scale to billions of users to simultaneously post, like, and comment
  • New posts can be seen by followers within 1 minute after posting
  • Store all historical data of billions of users
  • UI response is instantaneous (for example, like immediately increases like count on UI, although actual backend data may be delayed)


API Design

Backend:

  • post(message: string, userId: number)
  • like(userId: number, postUserId: number, postId: number)
  • comment(userId: number, postUserId: number, postId: number, content: string)
  • deletePost(userId: number, postId: number)
  • deleteComment(userId: number, postId: number, commentId: number)
  • dislike(userId: number, postId: number)
  • getPosts(userId: number, page: number)
  • getTopK(region: string)
  • follow(followee: number, follower: number)
  • unfollow(followee: number, follower: number)


High-Level Design

Main components:

  • Frontend UI
  • Backend REST API server
  • Data cache servers
  • Distributed Relational Database



Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

## sharded database.

  • Use raft algorithm to keep comment and like data consistent across multiple copies around the globe. Also add sharding based on userId to scale up serving capability. In the CAP theorem, we are more towards giving up consistency, since it's less valued for users.
  • Posts are less prone to data race. Since a single user can't make multiple posts at the same time, not to mention posting from different regions at once. So a sharded database + periodic regional syncing should be enough
  • database schema design:
  1. two main table to store posts and users
  2. a comment table of posts, with commentId as primary key and userId as secondary key
  3. similarly a like table, again mapping commentId and userId

All of the tables above should be sharded for scalability