System requirements
Functional:
- Users can create tweets, with a low character limit
- Users can share their tweet by a specific link
- Users can follow other users
- Users can see tweets of followed users in a "news feed" like page
- The news feed can also have "popular" tweets mixed in
- Users can like other users' tweets
- Users can see a list of tweets from a particular user on a "user profile" like page
Non-Functional:
- The system should be highly available
- The system should prioritize availability over consistency - if a tweet takes some time to show up in their followers' feeds, it's okay.
- The system should be able to handle a large amount of concurrent traffic
- The system should also prioritize handling global/regional traffic
Capacity estimation
- Assume ~100M DAU - basically "a lot of users"
- We won't focus too much on the capacity part for now, but we can come back to it if it impacts the design
API design
For now, we will assume a user system is created for us, and we will focus more on the "tweet" side of the system. We can come back to this if we want to dive deeper into it.
User Actions
- Create Tweet
- Follow User
- Share Tweet
- May just be a URL for initial design, we may add some API here if we want to track this action
- Like Tweet
- Get Tweets By User
- Get News Feed
Database design
For this, I think a RDB for the core DB is a good choice, since the system has a lot of intertwining relationships.
We will likely have a caching layer on top of this that we will dive more into in the design section.
User
- id: uuid
- number_followers: integer
- number_following: integer
Tweet
- id: uuid
- poster_id: fk(user)
- text: string
- total_likes: integer
Follows (user -> following)
- user_id: fk(user)
- following_id: fk(user)
Likes
- user_id: fk(user)
- tweet_id: fk(tweet)
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
User Service
- Follow user
Tweet Service
- Create tweet
- Like tweet
- Get Tweets for a user
News Feed Service
- Caches user's followers' tweets for their news feed
- Handles "popular" tweets
- GET news feed
Message Queue
- Since we need to be highly available, a lot of actions can be done through our message queue service.
- These actions would include:
- Aggregating total likes for a tweet
- Aggregating follow counts for a user
- Handling what counts as "popular"
Things we'll have, but will focus less on for the purpose of this interview:
- Load Balancer
- Data Center Syncing
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Create
- Creates the tweet in the DB
- Fires an event to our Message Queue to signal that the tweet was created
- This event triggers two actions: Update Counts and Update News Feeds
- Updating news feeds is a cache per user of their news feed
Like
- Likes the tweet in the DB
- Fires an event to our Message Queue to signal that the tweet was likes
- This event triggers two actions: Update Counts and Update News Feeds
- This will handle updating the "cache" of popular feeds
Follow
- Follows as user in the DB
- Fires an event to our Message Queue to signal that the user was followed
- This event triggers two actions: Update Counts
Databases are synced across regions, and the feeds are updated asynchronously.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Tradeoff
- Since our updating across the board is mostly async, it will take some time for a new tweet/like to propagate its effects throughout the entire system, but that is okay, since we prefer to have high availabilty.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
What if the news feed cache fails
- For this, we can fallback to just getting a list of the user's followers' tweets from the last X amount of time while we repopulate the cache.
What if the message queue fails
- We may want some more persistent storage for our events if we need to make sure our events are consistent. Maybe a sort of "outbox" pattern in eventing could be good here, where we store the event in the DB, and mark it "processed" once it is added to the queue.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
We should handle large spikes in usage, for example if a popular creator adds a tweet that suddenly gets many thousands of likes.