System requirements
Functional:
- System can handle tweets from the same users
- The system can handle share of tweets from users
- The system can handle likes from users
- The system will display tweets of other users
- The system can handle follow and unfollow of users
- The homepage will display top K tweets
Non-Functional:
The system needs to be has high availability
The system needs to be scalable.
The consistency rule will be eventual consistency, especially for data like user likes.
Capacity estimation
I will assume there will be 100m DAU, each of them will spend 1hr in the app. And they will make 3 tweet per day, like 10 tweets per day.
Assume each twitter is 200 characters
QPS_tweets = 100k*1/3600/24 = 3k/s
QPS_like = 100k*10/3600/24 = 50k/s
Storage: Data Type of Twitters: text
images, videos will be send to Object storage,
API design
External APIs:
POST postTweet(TweetDataType tweet, int userId)
POST shareTweet(TweetId tid, int userId)
POST countLikes(TweetId, int userId)
POST follow(int userId, int userIdToFollow)
Database design
Tweets Table:
tweetId, tweetContent, userId, creatorId
Tweets Like Table:
tweetId, likes
Tweets Table:
userId, followers
High-level design
Main Services
- TweetProcessing Service:
- process twitter and send it to proper storage
- TweetSending Service
- push tweets to followers' channels
- LikeAggregation service
- Count likes for a tweet within a timeframe and storage it into a database
- User Manage Service
- Manage Users and followers
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
- Tweet Post flow:
- Users->LoadBalancer: Post Tweet Request
- LoadBalancer->TweetProcessing Service: Process Tweet and send it to a proper storage
- TweetProcessing Service->TweetSending Service: query user followers, construct channels of followers
- TweetSending Service->Channels: push new tweet to channels
- Channels->Followers: push new Tweet
HomePageConstruction Service -> followers: sort and find top K tweets and storage it into a database, send it back to followers
Detailed component design
- Use Cache to store most recent top K
- Hotspot issue
- Use push for most users
- Use pull for celebrities
- Like aggregation:
- Instead of writing like directly into database, we will aggregate the like for a post and send aggregated result to database
Trade offs/Tech choices
push vs pull
Push algorithm:
Pros:
- Less requests from followers. They do not need to query the backend if there is no updates
Cons:
- Server error is not easy to be debugged
Pull algorithm
Pros:
- It is easier to monitor server health from client side
Cons:
- High QPS when there is no update
Failure scenarios/bottlenecks
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?