Requirements
Functional Requirements:
- Allow users to tweet messages up to 140 characters.
- Enable users to follow other users.
- Allow users to like tweets from other users.
- Display tweets from followed users in the home feed.
- Show top K popular tweets in the home feed based on likes and followers.
Non-Functional Requirements:
- Scalable to a large number of tweets
- Home feed should be updated with top K popular tweets every 10 minutes
- User should be able to see their own posts immediately (read own write consistency)
- System should be fault tolerant and highly available across regions
- Followed users should show tweets < 10s (eventual consistency)
Capacity Estimation
Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...
API Design
- CreateTweet (message) -> Success/failure
- Follow(UserID)
- Like (TweetID)
- ShowFeed() -> Feed
High-Level Design
For Creating a tweet. User will reach API gateway. It will route request to Feedservice. The feedservice will create the tweet and store it into our database.
For Following a user, the API gateway will route to the User service which will then write the new data into the DB. This could reuse the same Feed service, but I will split these two services, because I intend to add more features to the feed service later and we can then scale up the feed and user service separately
For Liking a tweet, API gateway will route to the User/Like service. The service will then update the count in the database.
Now for creating a feed. The feed fetch will route the data to the API gateway. The simplest way is that each user has a FeedID in the DB and the feed service simply reads it from the DB.
Actually, I've changed the design so that all the APIs route to the tweet/user/like service. This service will read from the DB and fetch the Feed for each user. What builds this feed is the "Feed service". The feed service is a background job or application that reads from a tweet event queue. It processes each event and builds a Feed from the events. It also gets events such as likes from users for specific tweets. Using this information it builds an updated FeedID for various users every 10 minutes.
I just realized this is slightly different than our previous requirement of top K tweets. This top K tweets is actually easier. For our feed service it basically just aggregates the likes of various tweets across a time period. Lets say that time period is 10 minutes. After 10 minutes, it flushes the top K tweets into the DB. It will also take the top K + for any tweet events for all followers of that tweetID in the past 10 minutes and aggregate it into a Feed for a user.
Database Design
Entities
- Users
- UserID
- Followers : UserID []
- Following: UserID []
- Tweets: TweetID []
- Feed : FeedID
- Tweets
- UserID
- Message
- Creation Date
- TweetID
- Feed
- Tweets: TweetID[]
- FeedID
- Likes
- TweetID
- LikeID
- userID
- Top K
- Tweets: TweetID []
- timestamp
Detailed Component Design
Okay now lets dive into how our system will scale. Whenever we have a large number of tweets. The first thing that gets affected its the number of writes to our database. If we need to split this load, we can shard the database on TweetID. This should improve our write scalability for new tweets.
Whenever we create a new tweet, we also need to fan out this tweet to other users. The way our current system does this is very periodic, it has to go through the tweet events queue and then the feed service. I think it would make sense to split this functionality. For top K we aggregate across likes over a time period and then generate a top K for this time period. We save this into the database. We could also save this in a cache, depending on whether its important to keep this top K result over time versus if we just wanted to save it for the most recent time period.
Back to fanning out a single tweet. For a single tweet, we store it into the tweets event queue. The tweet service will read this tweet event queue and fan out the tweet into the Feed for every user following the poster of the Tweet. It will then write it into the database.
How does a user always see their writes. Whenever we write a tweet into the database, for that user, we should also immediately write that tweet into the Feed of that user. That way the user can see its own post immediately. It will still take time for the tweet service to process the tweet event queue however.
For highly available and fault tolerance, we need to be able to rebuild the event queue if failure occurs. One way is to have some setting in the event queue to write the events to disk and rebuild the queue on failure. We can scale up the tweet service as necessary as theres no conflict between each tweet service. We do need to be careful on scaling the feed service as it needs to aggregate values and counter must be shared. If we scale the feed service to multiple nodes than there must be aggregation of the counter values at the end or the nodes needs to write to a common counter. Both have fault issues. If a single node goes down all of its processed data is lost. For the common counter, the counter is a single source of failure. I think I prefer the counter as then we can save the counter occassionally.
Our last requirement is for the tweets of followers to show up quickly. When we tweet, it should fan out to the database as fast as possible. If our tweet service and database is fast than a user should be able to see the updated feed easily. We shard our database on Users ID as well and denormalize the Feed so that we can easily fetch a feed for a user. But then this results in a user having to query multiple shards for the tweets. One way we go speed this up is to cache our tweets. Since we're more likely to fetch popular and new tweets, adding a cache would be a good speedup. We add some functionality on the client side poll for new tweets every 1s so that the feed is consistently refreshed.