System requirements


Functional:

send tweets

check feeds

add likes to tweets



Non-Functional:

availability

scalability

low latency



Capacity estimation

1 billion users

daily active users 10%, so 100 million

every user send 2 tweets per day, so 200 million tweets per day

every user follows 1000 users, so 5000 billion follow relationships

every user likes 10 tweets per day, have 1 billion likes per day



API design

POST v1/user_id/tweets/tweet_id

GET v1/user_id/feeds

POST v1/user_id/tweet_id/is_liked

POST v1/user_id/following_user_id/is_following




Database design

Use graph database to store relationship between users and followers

User relational database to store user metadata and tweets metadata

User cache for popular tweets





High-level design

request first go through a load balancer, before reaching the server. For reads, it checks cache first, before reading from database, also checks CDN for static contents. For writes, write to database



Request flows

request first go through a load balancer, before reaching the server. For reads, it checks cache first, before reading from database, also checks CDN for static contents. For writes, write to database



Detailed component design

use different strategy for different users to avoid hot partition and still provide low latency:

  1. for users don't have a lot of followers, use push model, fanout their new tweets to every follower's timeline
  2. for celebrity users who have a lot of followers, user pull model, only pull their new tweets when followers checks new feeds




Trade offs/Tech choices

use different strategy for different users to avoid hot partition and still provide low latency, for celebrity users who have a lot of followers, user pull model, it will add latency but it won't slow the whole system down when a celebrity sends tweets



Failure scenarios/bottlenecks

  1. when hot events happen, a lot of reads and writes happen at the same time, add to latency, solution: manually scale up before predicted hot events
  2. storage issue, archive old data in cold storage




Future improvements

  1. when hot events happen, a lot of reads and writes happen at the same time, add to latency, solution: manually scale up before predicted hot events
  2. storage issue, archive old data in cold storage