Design Twitter - System Design

System requirements

Functional:

send tweets

check feeds

add likes to tweets

Non-Functional:

availability

scalability

low latency

Capacity estimation

1 billion users

daily active users 10%, so 100 million

every user send 2 tweets per day, so 200 million tweets per day

every user follows 1000 users, so 5000 billion follow relationships

every user likes 10 tweets per day, have 1 billion likes per day

API design

POST v1/user_id/tweets/tweet_id

GET v1/user_id/feeds

POST v1/user_id/tweet_id/is_liked

POST v1/user_id/following_user_id/is_following

Database design

Use graph database to store relationship between users and followers

User relational database to store user metadata and tweets metadata

User cache for popular tweets

High-level design

request first go through a load balancer, before reaching the server. For reads, it checks cache first, before reading from database, also checks CDN for static contents. For writes, write to database

Request flows

Detailed component design

use different strategy for different users to avoid hot partition and still provide low latency:

for users don't have a lot of followers, use push model, fanout their new tweets to every follower's timeline
for celebrity users who have a lot of followers, user pull model, only pull their new tweets when followers checks new feeds

Trade offs/Tech choices

use different strategy for different users to avoid hot partition and still provide low latency, for celebrity users who have a lot of followers, user pull model, it will add latency but it won't slow the whole system down when a celebrity sends tweets

Failure scenarios/bottlenecks

when hot events happen, a lot of reads and writes happen at the same time, add to latency, solution: manually scale up before predicted hot events
storage issue, archive old data in cold storage

Future improvements

when hot events happen, a lot of reads and writes happen at the same time, add to latency, solution: manually scale up before predicted hot events
storage issue, archive old data in cold storage