System requirements
Functional:
send tweets
check feeds
add likes to tweets
Non-Functional:
availability
scalability
low latency
Capacity estimation
1 billion users
daily active users 10%, so 100 million
every user send 2 tweets per day, so 200 million tweets per day
every user follows 1000 users, so 5000 billion follow relationships
every user likes 10 tweets per day, have 1 billion likes per day
API design
POST v1/user_id/tweets/tweet_id
GET v1/user_id/feeds
POST v1/user_id/tweet_id/is_liked
POST v1/user_id/following_user_id/is_following
Database design
Use graph database to store relationship between users and followers
User relational database to store user metadata and tweets metadata
User cache for popular tweets
High-level design
request first go through a load balancer, before reaching the server. For reads, it checks cache first, before reading from database, also checks CDN for static contents. For writes, write to database
Request flows
request first go through a load balancer, before reaching the server. For reads, it checks cache first, before reading from database, also checks CDN for static contents. For writes, write to database
Detailed component design
use different strategy for different users to avoid hot partition and still provide low latency:
- for users don't have a lot of followers, use push model, fanout their new tweets to every follower's timeline
- for celebrity users who have a lot of followers, user pull model, only pull their new tweets when followers checks new feeds
Trade offs/Tech choices
use different strategy for different users to avoid hot partition and still provide low latency, for celebrity users who have a lot of followers, user pull model, it will add latency but it won't slow the whole system down when a celebrity sends tweets
Failure scenarios/bottlenecks
- when hot events happen, a lot of reads and writes happen at the same time, add to latency, solution: manually scale up before predicted hot events
- storage issue, archive old data in cold storage
Future improvements
- when hot events happen, a lot of reads and writes happen at the same time, add to latency, solution: manually scale up before predicted hot events
- storage issue, archive old data in cold storage