System requirements


Functional:

  1. System can handle tweets from the same users
  2. The system can handle share of tweets from users
  3. The system can handle likes from users
  4. The system will display tweets of other users
  5. The system can handle follow and unfollow of users
  6. The homepage will display top K tweets



Non-Functional:

The system needs to be has high availability

The system needs to be scalable.

The consistency rule will be eventual consistency, especially for data like user likes.




Capacity estimation

I will assume there will be 100m DAU, each of them will spend 1hr in the app. And they will make 3 tweet per day, like 10 tweets per day.

Assume each twitter is 200 characters


QPS_tweets = 100k*1/3600/24 = 3k/s

QPS_like = 100k*10/3600/24 = 50k/s


Storage: Data Type of Twitters: text


images, videos will be send to Object storage,



API design

External APIs:

POST postTweet(TweetDataType tweet, int userId)

POST shareTweet(TweetId tid, int userId)

POST countLikes(TweetId, int userId)

POST follow(int userId, int userIdToFollow)


Database design


Tweets Table:

tweetId, tweetContent, userId, creatorId


Tweets Like Table:

tweetId, likes


Tweets Table:

userId, followers




High-level design


Main Services

  1. TweetProcessing Service:
    • process twitter and send it to proper storage
  2. TweetSending Service
    • push tweets to followers' channels
  3. LikeAggregation service
    • Count likes for a tweet within a timeframe and storage it into a database
  4. User Manage Service
    • Manage Users and followers





Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


  1. Tweet Post flow:
    1. Users->LoadBalancer: Post Tweet Request
    2. LoadBalancer->TweetProcessing Service: Process Tweet and send it to a proper storage
    3. TweetProcessing Service->TweetSending Service: query user followers, construct channels of followers
    4. TweetSending Service->Channels: push new tweet to channels
    5. Channels->Followers: push new Tweet

HomePageConstruction Service -> followers: sort and find top K tweets and storage it into a database, send it back to followers




Detailed component design

  1. Use Cache to store most recent top K
  2. Hotspot issue
    1. Use push for most users
    2. Use pull for celebrities
  3. Like aggregation:
    1. Instead of writing like directly into database, we will aggregate the like for a post and send aggregated result to database






Trade offs/Tech choices

push vs pull


Push algorithm:

Pros:

  • Less requests from followers. They do not need to query the backend if there is no updates

Cons:

  • Server error is not easy to be debugged


Pull algorithm

Pros:

  • It is easier to monitor server health from client side

Cons:

  • High QPS when there is no update



Failure scenarios/bottlenecks






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?