System requirements
Functional:
users should be able to publish tweets
users should be able to follow other users
users should get a personalized feed of tweets published by people they follow
users should be able to like and comment on tweets
Non-Functional:
system should be scalable to many users
system should have high availability for better user experience
users should be able to quickly see their personalized feed
we want both reads and writes to be fast, but reads especially since likely there will be more reads than writes
Capacity estimation
1 billion users
each user publishes 10 tweets a day
each tweet is 100 characters long on average so 100 bytes per tweet
each tweet has on average 10 comments of 100 characters each
plus some amount of other metadata
so 10TB of data per day over say 400 days = 4PB per year
each user follows 100 users so feed will be 100 users x 1000 bytes per tweet (plus comments) x 10 tweets per day x 400 days per year = 400MB
so we can hold user feed in memory (partitioning based on user id)
API design
post_tweet(user_id, tweet_id, tweet_contents)
post_comment(user_id, tweet_id, comment_contents)
like_tweet(user_id, tweet_id)
get_feed(user_id)
follow_user(user_id, user_to_follow)
Database design
users table:
user_id, password_hash, other user metadata
user_following table:
user_id, user_following_id
user_followers table:
user_id, user_follower_id
tweets table:
tweet_id, user_id, tweet_contents, timestamp, num_likes, other metadata
comments table:
tweet_id, user_id, comment, timestamp, other metadata
users, user_following and user_followers table indexed on user_id column
tweets table indexed on user_id and timestamp
comments table indexed on tweet_id
High-level design
we have a load balancer as the first entry point of the system for both writers and readers
the load balancer determines which server to route the request to
we have a cluster of servers for reliability and fault tolerance
the servers on the writer side are responsible for writing to the various databases
if a user follows another user the server will write that relationship to the database which will then be propagated to a kafka queue by change data capture. the kafka queue is connected to flink which has an in memory mapping of user id to people they are following. this is to enable fast feed generation and fast generation of a list of who a user is following. flink will use the in memory mapping of the follower relationships to know which news feed cache to update with the corresponding tweet or comment.
if a user publishes a tweet the server will write that tweet to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.
if a user comments on a tweet the server will write that comment to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.
flink, kafka queue for user following updates, kafka queue for tweets, and kafka queue for comments are all partitioned based on user id
if a user likes a tweet, the server will route that to a kafka queue which then propagates to spark streaming and then in batches the number of likes gets written to the column in the respective tweet's row in the tweets database. this is so that we're able to batch the updates to the row in the tweets database and prevent lock contention for writes.
if a tweet is published by a user that is very popular then flink can preemptively push the tweet to the redis caching layer for popular posts, instead of updating a ton of users news feed caches since that would be too slow. since we know the tweet will be popular and have a lot of readers so we can improve read performance by populating the cache ahead of time.
a reader client is able to quickly generate their feed of tweets by looking at the news feed cache that corresponds to their user id and then also in the redis caching layer for any popular posts.
if a user wants to see the people that follow them, they can query the user_followers database. this database is populated by change data capture from the user following database that way the action of following doesn't need to write to two databases to be successful (slow writes), and instead the user_followers table can be derived data.
Request flows
if a user follows another user the server will write that relationship to the database which will then be propagated to a kafka queue by change data capture. the kafka queue is connected to flink which has an in memory mapping of user id to followers and their tweets and comments. this is to enable fast feed generation and fast generation of a list of who a user is following.
if a user publishes a tweet the server will write that tweet to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.
if a user comments on a tweet the server will write that comment to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.
flink, kafka queue for user following updates, kafka queue for tweets, and kafka queue for comments are all partitioned based on user id
if a user likes a tweet, the server will route that to a kafka queue which then propagates to spark streaming and then in batches the number of likes gets written to the column in the respective tweet's row in the tweets database. this is so that we're able to batch the updates to the row in the tweets database and prevent lock contention for writes.
a reader client is able to quickly generate their feed of tweets by looking at the news feed cache that corresponds to their user id and then also in the redis caching layer for any popular posts.
if a user wants to see the people that follow them, they can query the user_followers database. this database is populated by change data capture from the user following database that way the action of following doesn't need to write to two databases to be successful (slow writes), and instead the user_followers table can be derived data.
Detailed component design
the user_following and user_followers databases will have a lot of writes and we don't need to worry about write conflicts, every write is valid. so we can use Cassandra which utilizes leaderless replication and an LSM tree for fast writes
for the tweets database:
we want to quickly write to the database since all tweets will go there first. we can utilize Cassandra for fast writes. the leaderless replication and LSM tree gives us fast writes and we can partition on user id and sort by timestamp. this also means that if we have to query the database, getting all tweets by a given user will be fast.
all reads and writes go to one partition for cassandra so we'll partition on user id. we're okay with eventual consistency and technically dont have great data integrity but we don't really anticipate write conflicts since tweets should have a unique id and they arent being modified often
for the comments database:
multileader replication probably wont work here because we need causal dependency for comments so we can just utilize single leader replication in mysql.
Trade offs/Tech choices
for the tweets database:
we want to quickly write to the database since all tweets will go there first. we can utilize Cassandra for fast writes. the leaderless replication and LSM tree gives us fast writes and we can partition on user id and sort by timestamp. this also means that if we have to query the database, getting all tweets by a given user will be fast.
all reads and writes go to one partition for cassandra so we'll partition on user id. we're okay with eventual consistency and technically dont have great data integrity but we don't really anticipate write conflicts since tweets should have a unique id and they arent being modified often
Failure scenarios/bottlenecks
kafka is log based message broker so if we have fault tolerance in that all the data will be written to disk and can be replayed
flink has exactly once message processing guarantees and uses checkpointing for fault tolerance
databases are replicated
redis caching layer for news feed and popular posts is in memory so not fault tolerant, but for a cache we don't need it to be
system is largely bottlenecked by the speed of writing to the tweets database
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?