System requirements


Functional:

  • compose and share tweets
  • track the updates of other users
  • indicate their appreciation for specific tweets by following them


Non-Functional:

  • let's say the maximize size for one tweets is 10M, allow multimedia such as image and video clips
  • say the maximum friends for one people is 500.
  • user can see the new notification when his friends post a new tweet, suppose the notification only contains few texts including user_id and tweet_id, it would be less than 1K




Capacity estimation


Estimate the scale of the system you are going to design...

  • if we suppose 10**6 users, that might be 10G tweets to save, so 100G storage might be fit in this.
  • suppose the notification may be less than 1k, so a user's new tweet would cause 500k data flow, 500 friends will cause only 250M data flows, which is not a big deal.




API design


RESTful API:

  • compose and share tweets
    • POST /v1/tweets
      • create_time
      • location
      • text
      • user_id
    • GET /v1/{user_id}/tweets
      • limit: page size
      • offset: page offset
  • follow other tweets
    • POST /v1/follow/{tweet_id}
      • {header}
        • user_info or user_token





Database design


three db tables needed:

  • tweets
    • id
    • user_id
    • create_time
    • content
    • location
    • status
  • follow
    • id
    • user_id
    • tweet_id
    • follow_time
    • status
  • user and tweet list
    • id
    • name
    • desc
    • head_img
    • create_time
    • status
  • friends: user and friend list
    • id
    • user_id
    • friend_id
    • create_time
    • status




High-level design


  • using message queue such as kafka since we don't need a strict update function, when a user post a tweet, it produce a message to a queue and consumer would get the message and send it to a redis queue.





Request flows


  1. when a user post a tweet
    1. would insert a record in the database
    2. would send a message to message queue, and the consumer would send this new tweet info into the redis list which records the latest tweets of his friends
  2. when user view the tweet
    1. would calling GET tweets request, to get all data sorted by create_time reversely.
    2. clear the waiting list of current user in redis
  3. when user following the tweet
    1. insert a record in user_and_tweet list





Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


redis storage:


  • waiting list: user's latest unread tweets info, structure is <user_id>: {user_id, tweet_id}


message queue:

  • consumer: send the message to the redis waiting list using user_id key
  • producer: produce the new post message with (user_id, friend_id, tweet_id)



Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


  • message queue: it can smooth the data flow, prevent system from being crashed when large of request happens at the same time.
  • redis: it has the fast access speed, to store the temporary data like waiting list





Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

  • using big data tech and cdn to store images and videos and tweets, to get the better access speed.