Design Twitter - System Design

System requirements

Functional:

User can post tweets
User can follow other users
User can view followed tweets on their home timeline
User can view another user's profile home page
Tweets are shown in reverse chronological order
User can like a tweet
A tweet can contain texts and media files such as picture

Non-Functional:

Posted tweets should be updated to show up in real time
Prioritize high availability over consistency

Capacity estimation

Total users: 1M per day

Tweets sent per day: 5M => write QPS: 5M / 24 / 3600 = 58

Tweets view per day: 500M => read QPS: 500M / 24 / 3600 = 5800

Favorites per day: 50M => write QPS: 580

Total QPS: 6500

Peak QPS estimation: 2 * 6500 = 13000

This can be handled by 20s SQL machines

We can see read is much larger than write

Storage estimation:

One tweet = 200 byte text + 5M media

Assuming 20% of tweets contain media

One day: 5.2MByte * 1M + 200 byte * 4M = 5000 TB

If we store the data for 50 years: 50 * 365 * 5000 TB

API design

User can post tweets
Post /v1/tweet
body {auth_token, user_id, content}
User can follow other users
Post /v1/follow
body {auth_token, user_id, follow_user_id}
User can view followed tweets on their home timeline
Get /v1/home:user_id
User can view another user's profile home page
Get /v1/profile:user_id
User can like a tweet
Post /v1/favorite
body {auth_token, user_id, tweet_id}

Database design

Tweet table - Store tweet info

TweetId
OwnerId
Date
Text
Media link
Like count

User table - Store user info

UserId
UserName
RegisterDate
Follower count
Country
Gender
Birthday

Like table - Store like info

TweetId
LikedUserId

Follow table - Store follow info

UserId
FollwerId

Timeline table - Store timeline info managed

UserId
TweetId

High-level design

Client

End user client

Media file CDN

Store media files for tweets to ensure they are highly available

Load balancer

Ensure requests are equally distributed to different servers

API Gateway / Webapp server

Return end user web page
Rate limiting
Route API requests to corresponding services

Tweet service

Handle post tweet request

Fanout service

Fanout a newly posted tweet to follower's timeline

Message queue - Kafka

Pub / sub for tweet post request between tweet service and fanout service

Follow service

Handle user follow request

Favorite service

Handle tweet like request

Home/profile service

Handle timeline request

Timeline Cache

Cache timeline to make sure it's highly available

Request flows

Post tweet

User post a tweet from client, an API request Post v1/tweet is sent to tweet service after going through load balancer and API gateway
Tweet service write the new tweet into tweet table
Tweet service publish a tweet posted message to message queue
Fanout service subscribes the message queue, when received a message, it add the new tweet into the user and followers timeline table as well as timeline cache
On end user side, the user will see the new post on profile, follower will see the new post on their home

View home/profile timeline

When user land on home page, home/profile service first get tweets from timeline cache, which stores X most recent tweets, then fetch tweet info from tweet table and return to the user to render the home page with tweets.
The service also query timeline table in the DB to get more than X tweets
Tweets will be sorted by reverse chronological order with tweet id to return to user

Follow / Unfollow

When a user follow another user, a follow API request will be sent to follow service
Follow service updates follow count in the user table, and then write a new line of data into follower table to record the follow
Once the request is completed, it returns OK to the client, the follow button on client side will change to unfollow.
For unfollow, it's a reverse operation to follow

When user clicks like button on a tweet, a favorite request is sent to favorite service
The service update likes count for the tweet in tweet table as well as write a new line of data into like table
Once the request is completed, it returns OK to the client and user see the like count is updated

Detailed component design

Home/profile service

The service first get tweets from timeline cache, which stores X most recent tweets, then fetch tweet info from tweet table and return to the user to render the home page with tweets.
The service also query timeline table in the DB to get more than X tweets

Timeline Cache

The timeline cache stores X most recent tweets for a user which can be sorted which reverse chronological order by tweet id
For cache eviction, the least recently visited users' timeline will be evicted
Users who has large follower group such as celebrities would most likely have their timeline in cache, in order to allow other users to get benefit from cache, we can prepare separate caches for celebrities

Sharding strategy

Similarly, celebrities tweets will have larger amount of viewing request, causing hotspot on read for most of the tables, we can shard a separate DB for celebrities to increase availability

Trade offs/Tech choices

TweetId

We can use snowflake id as tweet id, it's a 64 bit id that include useful information such as timestamp info, region info, we can sort the tweetid to generate a timeline with reverses chronological order, and it does not require a central place to generate the id

During favorite flow, if the favorite count is updated but follow request failed, user may see an inconsistency of follow count and follower, this is OK as it's not a key info, we can further introduce a daily worker to run on follow table to fix the follow count

Similar to follow, the like count can be inconsistency with the actual like if any of the like request failed, which is OK. We can also introduce a daily worker to fix the count.

Failure scenarios/bottlenecks

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?