System requirements


Functional:

users should be able to publish tweets

users should be able to follow other users

users should get a personalized feed of tweets published by people they follow

users should be able to like and comment on tweets


Non-Functional:

system should be scalable to many users

system should have high availability for better user experience

users should be able to quickly see their personalized feed

we want both reads and writes to be fast, but reads especially since likely there will be more reads than writes




Capacity estimation

1 billion users

each user publishes 10 tweets a day

each tweet is 100 characters long on average so 100 bytes per tweet

each tweet has on average 10 comments of 100 characters each

plus some amount of other metadata

so 10TB of data per day over say 400 days = 4PB per year

each user follows 100 users so feed will be 100 users x 1000 bytes per tweet (plus comments) x 10 tweets per day x 400 days per year = 400MB

so we can hold user feed in memory (partitioning based on user id)


API design

post_tweet(user_id, tweet_id, tweet_contents)

post_comment(user_id, tweet_id, comment_contents)

like_tweet(user_id, tweet_id)

get_feed(user_id)

follow_user(user_id, user_to_follow)



Database design

users table:

user_id, password_hash, other user metadata


user_following table:

user_id, user_following_id


user_followers table:

user_id, user_follower_id


tweets table:

tweet_id, user_id, tweet_contents, timestamp, num_likes, other metadata


comments table:

tweet_id, user_id, comment, timestamp, other metadata


users, user_following and user_followers table indexed on user_id column

tweets table indexed on user_id and timestamp

comments table indexed on tweet_id



High-level design

we have a load balancer as the first entry point of the system for both writers and readers

the load balancer determines which server to route the request to

we have a cluster of servers for reliability and fault tolerance

the servers on the writer side are responsible for writing to the various databases

if a user follows another user the server will write that relationship to the database which will then be propagated to a kafka queue by change data capture. the kafka queue is connected to flink which has an in memory mapping of user id to people they are following. this is to enable fast feed generation and fast generation of a list of who a user is following. flink will use the in memory mapping of the follower relationships to know which news feed cache to update with the corresponding tweet or comment.


if a user publishes a tweet the server will write that tweet to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.

if a user comments on a tweet the server will write that comment to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.

flink, kafka queue for user following updates, kafka queue for tweets, and kafka queue for comments are all partitioned based on user id


if a user likes a tweet, the server will route that to a kafka queue which then propagates to spark streaming and then in batches the number of likes gets written to the column in the respective tweet's row in the tweets database. this is so that we're able to batch the updates to the row in the tweets database and prevent lock contention for writes.


if a tweet is published by a user that is very popular then flink can preemptively push the tweet to the redis caching layer for popular posts, instead of updating a ton of users news feed caches since that would be too slow. since we know the tweet will be popular and have a lot of readers so we can improve read performance by populating the cache ahead of time.


a reader client is able to quickly generate their feed of tweets by looking at the news feed cache that corresponds to their user id and then also in the redis caching layer for any popular posts.



if a user wants to see the people that follow them, they can query the user_followers database. this database is populated by change data capture from the user following database that way the action of following doesn't need to write to two databases to be successful (slow writes), and instead the user_followers table can be derived data.







Request flows

if a user follows another user the server will write that relationship to the database which will then be propagated to a kafka queue by change data capture. the kafka queue is connected to flink which has an in memory mapping of user id to followers and their tweets and comments. this is to enable fast feed generation and fast generation of a list of who a user is following.


if a user publishes a tweet the server will write that tweet to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.

if a user comments on a tweet the server will write that comment to the database which will then be propagated to a kafka queue by change data capture. kafka queue is connected to same flink.

flink, kafka queue for user following updates, kafka queue for tweets, and kafka queue for comments are all partitioned based on user id


if a user likes a tweet, the server will route that to a kafka queue which then propagates to spark streaming and then in batches the number of likes gets written to the column in the respective tweet's row in the tweets database. this is so that we're able to batch the updates to the row in the tweets database and prevent lock contention for writes.



a reader client is able to quickly generate their feed of tweets by looking at the news feed cache that corresponds to their user id and then also in the redis caching layer for any popular posts.


if a user wants to see the people that follow them, they can query the user_followers database. this database is populated by change data capture from the user following database that way the action of following doesn't need to write to two databases to be successful (slow writes), and instead the user_followers table can be derived data.



Detailed component design

the user_following and user_followers databases will have a lot of writes and we don't need to worry about write conflicts, every write is valid. so we can use Cassandra which utilizes leaderless replication and an LSM tree for fast writes


for the tweets database:

we want to quickly write to the database since all tweets will go there first. we can utilize Cassandra for fast writes. the leaderless replication and LSM tree gives us fast writes and we can partition on user id and sort by timestamp. this also means that if we have to query the database, getting all tweets by a given user will be fast.

all reads and writes go to one partition for cassandra so we'll partition on user id. we're okay with eventual consistency and technically dont have great data integrity but we don't really anticipate write conflicts since tweets should have a unique id and they arent being modified often


for the comments database:

multileader replication probably wont work here because we need causal dependency for comments so we can just utilize single leader replication in mysql.





Trade offs/Tech choices

for the tweets database:

we want to quickly write to the database since all tweets will go there first. we can utilize Cassandra for fast writes. the leaderless replication and LSM tree gives us fast writes and we can partition on user id and sort by timestamp. this also means that if we have to query the database, getting all tweets by a given user will be fast.

all reads and writes go to one partition for cassandra so we'll partition on user id. we're okay with eventual consistency and technically dont have great data integrity but we don't really anticipate write conflicts since tweets should have a unique id and they arent being modified often


Failure scenarios/bottlenecks

kafka is log based message broker so if we have fault tolerance in that all the data will be written to disk and can be replayed

flink has exactly once message processing guarantees and uses checkpointing for fault tolerance

databases are replicated

redis caching layer for news feed and popular posts is in memory so not fault tolerant, but for a cache we don't need it to be

system is largely bottlenecked by the speed of writing to the tweets database


Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?