System requirements
Functional:
- compose and share tweets
- track the updates of other users
- indicate their appreciation for specific tweets by following them
Non-Functional:
- let's say the maximize size for one tweets is 10M, allow multimedia such as image and video clips
- say the maximum friends for one people is 500.
- user can see the new notification when his friends post a new tweet, suppose the notification only contains few texts including user_id and tweet_id, it would be less than 1K
Capacity estimation
Estimate the scale of the system you are going to design...
- if we suppose 10**6 users, that might be 10G tweets to save, so 100G storage might be fit in this.
- suppose the notification may be less than 1k, so a user's new tweet would cause 500k data flow, 500 friends will cause only 250M data flows, which is not a big deal.
API design
RESTful API:
- compose and share tweets
- POST /v1/tweets
- create_time
- location
- text
- user_id
- GET /v1/{user_id}/tweets
- limit: page size
- offset: page offset
- POST /v1/tweets
- follow other tweets
- POST /v1/follow/{tweet_id}
- {header}
- user_info or user_token
- {header}
- POST /v1/follow/{tweet_id}
Database design
three db tables needed:
- tweets
- id
- user_id
- create_time
- content
- location
- status
- follow
- id
- user_id
- tweet_id
- follow_time
- status
- user and tweet list
- id
- name
- desc
- head_img
- create_time
- status
- friends: user and friend list
- id
- user_id
- friend_id
- create_time
- status
High-level design
- using message queue such as kafka since we don't need a strict update function, when a user post a tweet, it produce a message to a queue and consumer would get the message and send it to a redis queue.
Request flows
- when a user post a tweet
- would insert a record in the database
- would send a message to message queue, and the consumer would send this new tweet info into the redis list which records the latest tweets of his friends
- when user view the tweet
- would calling GET tweets request, to get all data sorted by create_time reversely.
- clear the waiting list of current user in redis
- when user following the tweet
- insert a record in user_and_tweet list
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
redis storage:
- waiting list: user's latest unread tweets info, structure is <user_id>: {user_id, tweet_id}
message queue:
- consumer: send the message to the redis waiting list using user_id key
- producer: produce the new post message with (user_id, friend_id, tweet_id)
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
- message queue: it can smooth the data flow, prevent system from being crashed when large of request happens at the same time.
- redis: it has the fast access speed, to store the temporary data like waiting list
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
- using big data tech and cdn to store images and videos and tweets, to get the better access speed.