System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
User can follow other members, and be followed by people.
User can compose a tweet which should be shared to all the people following the user.
User can see a timeline of tweets from people they follow.
User can favorite a tweet.
Non-Functional:
List non-functional requirements for the system...
Read latency should be fast. We need to load tweets from a users following in under 150 ms.
A user should be able to see their posted tweet immediately. Sharing to followers can be eventually consistent.
We need to consider hot users who are followed by millions of people. When they post a tweet our system should still be available.
Capacity estimation
Estimate the scale of the system you are going to design...
1 billion users
each tweet ~100 chars with 100 bytes metadata
each user posts 10x per day
2 TB per day = ~800 TB per year
Each user follows ~100 people.
100k users are "hot" users and followed by millions of people
API design
Define what APIs are expected from the system...
getFollowers(userId)
getFollowing(userId)
getNewsFeed(userId)
postTweet(userId, message)
favoriteTweet(userId, tweetId)
followUser(userId)
maybe a getFavorites(userId) if a users should see all their favorite tweets in 1 place
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
# User DB
User table schema:
user_id: PK UUID
name: string
password: hash
email: string
other metadata
User_follows schema:
user_id: PK FK indexed
follows_id: FK sorted by ID to find a row in log(n) time
This table stores which users a particular user is following. Since we have a billion users, we want to partition by user_id so all of 1 user's followers are in the same DB partition. Then we can use a range query to quickly load all the follows.
User_followed schema:
user_id: PK FK indexed
followed_by: FK sorted to delete a row in log(n) time
We will also need to show the user who is following them. Getting this data from the user_follows table will require cross-partition joins which are expensive. Therefore we will have a derived data set to help answer the question of who is following a particular user. Again we will partition by user_id so we can load all of the followed_by users in a range query.
User updates, follows, and followings are likely not going to be bottlenecked by the DB. Our write throughput is improved by partitioning on user_Id, and the load is spread out through consistent hashing. It would be simpler to have consistency and acid properties when making user profile updates, so MYSQL will be the DB technology used for the users DB.
# Tweets DB
tweets schema:
tweet_id: UUID PK
user_ID: FK
message: TEXT
maybe favorite count as a denormalized INT field
user_favorites:
user_id: FK indexed
tweet_id: FK
For our tweet DB choices, we can choose single leader or a leaderless solution. A leaderless solution will run into a causal consistency problem, where a user favorites a tweet on a node for 1 leader, but the tweet exists on another node. A solution to this is CRDT to resolve the conflicts. The other option is a single leader, but at 10 B tweets per day the DB may become a bottleneck. Therefore a leaderless solution such as Cassandra will be a good choice here. Again, we should partition on user_id so that all of a users tweets are located in the same partition.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
To stay at our 150ms tweet load requirement, we will need to perform extra work during our writes to make reads faster. The reason we need this is because fetching a users follows and then aggregating all of their posts will likely be much more than 150ms. Therefore we will introduce a caching layer.
For each user, we will construct a timeline of their feed and cache it for easy retrieval. If we store the first 100 tweets for a user and each tweet is 200 bytes, then that is 2TB of total storage needed, which is definitely doable. To populate these caches, we will introduce a few components to propagate the new posts to the right caches:
- A Kafka queue partitioned by userID will process changes emitted by CDC from the user-following table and the tweets table. We are choosing Kafka here because of its durability, if it goes down we can replay messages.
- A Flink consumer, also partitioned by userId, will listen to these events from the Kafka queues. The Flink consumer will be populated with user-follows, so when a tweet comes in, it will ask Zookeeper for the appropriate cache.
- Additionally, this Flink consumer can update our derived user-followed-by table because it listens to the changes in user-follows.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Write path:
- User A makes a tweet
- The tweet is stored in the table
- Kafka picks up the CDC event
- Flink consumes the Kafka event
- Flink updates its internal state with the new tweet for that user
- Flink updates external caches of all users from getFollowing(userId)
Read path:
- User A loads the page which calls getNewsFeed()
- A load balance directs the request to an app server who loads the Feeds from cache, or from the DB if there is a cache miss.
- On the profile pages, User A sees follower and following counts by making a request to the corresponding tables in the users DB
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Flink is used for stream processing because it can hold state and can periodically checkpoint to S3. The alternative to using Flink would be a service that made DB requests, then updated a cache.
Kafka is used for its durable properties, where a message can be replayed if Kafka goes down or is not consumed successfully.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
For hot users with millions of followers, posting a tweet would cause millions of caches to be updated. An optimization here would be to preload another cache with posts from popular users. Then, a NewsFeed service would aggregate the two caches together.
Flink is doing a lot of aggregation and is a single point of failure for updating the caches. If it goes down, the caches could become stale. We may want to add some redundancy by running multiple Flink nodes in an active/passive configuration, if that is possible.
If caches are down or we are seeing high cache penetration, this would increase the load on our DB potentially bringing that down as well. We should probably put a circuit breaker in front of our DBs and add redundancy to the cache layers.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Some future improvements on the requirements:
- Adding comments to the post would likely require another DB
- Adding Post privacy would require us to store a sort of permissioning between users. Flink would need this data as it propagates the posts to the caches.