System requirements


Functional:

  • User should be able to post tweets
  • User should be able to see tweets of people they follow
  • Users should be able to follow/unfollow other users
  • Users should be able to add tweets to favorite



Non-Functional:

  • Tweets feed should be low latency
  • Highly scalable when users grow
  • Tweets should not get lost and should be durable
  • Highly available




Capacity estimation

  • 300M active users
  • 100M new tweets per day
  • Each tweet consumes 100KB data
  • Total tweets data per day = 10TB/day = 10*365 ~ 4 PB /year




API design

  1. getNewsFeed(userId)
  2. followUser(from, to)
  3. addToFavorite(userId, tweetId)
  4. postTweet(userid, tweetId, tweetText)



Database design

  • User Table
  • user id
  • email id
  • password hash
  • create data
  • Tweets table
  • Tweet id
  • user id
  • tweet text
  • create time
  • Follower table
  • Follower user id
  • following user id
  • Following table
  • Following user id
  • follower user id
  • Favorites table
  • tweet id
  • user id




High-level design

  • 4 APIS
  • Load balancers for web servers and application servers
  • News feed cache for each user
  • DB for schema defined above and replicas
  • Kafka and Flink
  • Change Data Capture(CDC)



Request flows

  • User creates an account and the information gets stored to user table which is a MySQL table.
  • User posts a tweet. This post is saved to Posts DB which can be MySQL DB as post text is fixed. Then we also need to send this post to all of users followers. To achieve this we implement a CDC that captures changes in posts D and pushed them to a Kafka queue. From this Kafka a Flink service obtains posts.
  • When user follows another user, we store this in Following MySQL DB, also then CDC which send to another Kafka which send to previous Flink. Now the flink will have all followers of users and posts of user. Flink can use this to update news feed for all followers of a user. On average a user has 100 followers for we need to update 100 newsfeed
  • When a user adds a tweet to favorite, it is stored to favorites mySQL DB.
  • For users with million followers we don't update news feed cache but instead use a polling technique from the client device to get news feed item



Detailed component design

  • Kafka is sharded by user id so that Flink has all the data for a user including the new post, and followers information.
  • We have replicas for each DB which are partitioned. We use consistent hashing to split partitions evenly and make them fault tolerant and durable
  • We horizontally scale web servers and application servers as per user growth.
  • Since news feed is cached, we can provide low latency.




Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?