System requirements
Functional:
User can tweet (send) up to 140 character message.
User can follow other users.
User can like other users' tweets.
User's home feed will show tweets from the users they are following.
The home feed will show top K popular tweets, based on the number of likes they receive, and the number of the followers the author has.
Non-Functional:
Scalability. It will have to serve a very large population, e.g., 500M DAU.
Response time. User has to see tweets quickly. When user opens home feed, the first 10 tweets should show up within 500ms.
Availability.
Consistency requirement can be a little bit relaxed. Eventual consistency would suffice. Let's say a user tweets. If a user on the other side of the earth sees it after a minute or so, that would be acceptable.
[Generally speaking, you would like to keep the requirements scope small. You only have 35 - 50 min in an interview. If you have a lot of requirements, you'd risk running out of time. Of course there can be so many more requests in this solution - images & videos, advertisements, security, content moderation, anti abuse protection, to name a few. But we will start with a small set of requirements. Easier to expand later than shrink.]
Capacity estimation
500M DAU
Each user, on average, tweets twice a day. 1B tweets / day.
Each user, on average, views 100 tweets a day.
At peak, the system should scale up to 20% of DAU - 100M users - interacting with the system.
Let's discuss storage capacity first.
1B tweets a day. Each tweet is 140 chars. An example data model for tweets would be:
Tweet:
- tweet_ID: primary key
- created_by: user ID
- posted_time
- content
- medialink: link to picture or video content
- number_of_likes
- hashtags: list of hashtag strings used in the tweet.
- users_mentioned: list of users mentioned in the tweet
It seems reasonable to assume each message takes up to 512 bytes.
0.5KB * 1 billion = 512GB per day.
In two years, it would be 365 TB. With some buffer for growth, assume 512TB.
I will consider concurrency in later sections: millions of users viewing the same message concurrently, tweeting messages with the same hash tag, etc.
API design
tweet(user_ID, content)
User represented by user_ID tweets this content.
The content will be parsed by the server side for users and hashtags mentioned.
follow(follower, followed)
User represented by follower user ID will follow user represented by followed user ID.
like(user_ID, tweet_ID)
User represented by the user_ID likes the tweet represented by tweet_ID.
home_feed(user_ID, page_token, number)
Returns JSON document containing tweets that should be shown on the user's home feed.
page_token represents where in the list of top tweets it should start from.
number represents the number of tweets this API requests.
For example, when a user first opens their home feed, it would ask for ~10 tweets from the beginning of the home feed. After the initial 5 or 10 tweets are shown to the user, the client can request more, potentially in the background, to prepare for more feeds in case user scrolls the home feed.
WebSocket
The above API might be implemented on top of WebSocket connection. WebSocket provides a bi-directional communication functionality between the client (e.g. a browser) and the server, suitable for applications where users expect a real time interaction with the server.
Error Handling
Error handling would follow HTTP error code. For example, 4XX for client side error (e.g. liking a tweet ID that does not exist), 5XX for server side error (e.g. bug causing some of the APIs to fail). It would be important for client to handle too many requests error (429) correctly. For example, if home_feed() API returns 429, it would indicate the servers are too busy to service this potentially expensive call. Therefore, it would be advisable for the client to want for some time (say seconds) before calling this again.
Database design
[Sernior-level deep dive topic]
Database Choice
Observation on the data:
- Each row is relatively small.
- But there are many rows.
- It needs relational query, e.g., list tweets by a user.
- Eventual consistency would suffice.
There are two considerations: B-tree based database such as RDB or MongoDB, and log structure merge (LSM) based database such as Cassandra.
LSM based DB would thrive on writing these data, as it can scale horizontally and suitable for many small writes.
B-tree based DB would thrive on relational queries.
Based on these, a document NoSQL DB, e.g., MongoDB, looks like a good compromise. It is more horizontally scalable than RDBs, it can be configured with eventual consistency instead of strong consistency (increasing scalability), and it supports relational queries better than Cassandra.
Data Models
These are the data models.
Tweet:
- tweet_ID: primary key
- created_by: user ID, index
- posted_time: index
- content
- medialink: link to picture or video content
- number_of_likes
User:
- user_ID: primary key
- email: index
- name
- nickname
- DOB
Hashtag:
- hashtag_ID: primary key
- hashtag: the hashtag string
- tweets: tweet_IDs who have this hashtag.
We decided to store the list of tweets in Hashtag document to optimize the main functionality we want in hash tag - a quick mapping from hash tag to tweets (e.g. to display tags with this hash tag).
In RDB, we would have a foreign key from Tweets to Hashtag, and search through the Tweets table for a particular hashtag. In MongoDB, we can do this faster because of its schema flexibility.
The disadvantage of this approach is that the data is denormalized. For example, if tweets are deleted, we'd have to remove it from two places: Tweets document, and from Hashtag.tweets.
Counters
User.number_of_likes would have a scalability challenge. When a famous person tweets, it'd be possible millions of users would press the Like button. This would create the Tweet document a hot spot. To avoid this, we should create a sharded counter service for users with many followers, e.g., > 1,000 followers. For such users, the sharded counter service would do the counting, using multiple (let's say 100) subcounters, stored in a high performance key-value store like Redis Cache. Once in a while (say every 5 minutes or so), the service would write the number_of_likes in Tweet document.
Follows Relationship
Follows:
- follower: user_ID of the user who is following
- followed: user_ID of the user being followed
- timestamp
Follows document borrows the normalization concept from RDB. By having this document separated from User document, we can query the follows relationship in both ways efficiently: (1) given a user, list all the users they are following, and (2) given a user, list all the users who are following them.
High-level design
It is beneficial to clearly separate a write path from read path in this system.
Write path consists of:
- User tweets
- Async services work on the tweets, e.g., storing them and reflecting them in Home Feed
Write path consists of:
- User accessing Home Feed
Request flows
CDN would be important to cache static contents that are written once and read many times. For example, images and videos posted by a user.
Client request is served by either CDN (for static media content) or API Gateway.
API Gateway provides rate limiting (e.g. against Denial of Service attack), and forwards client requests to an appropriate service.
Tweet Service's job is to receive new tweets from the client. It saves them in Tweet DB. It also creates a message in Message Queue, which initiates some async actions.
Tweet Analyzer pulls a message from the Message Queue. It analyzes the tweet and takes appropriate actions, e.g., extract hashtags and user mentions and update appropriate database. We would need another, similar service for content filtering.
Home Feed service constructs a list of tweets that are recommended for the user, and returns the list in JSON format.
Detailed component design
Home Feed Generation
[Senior-level deep dive topic!]
Home Feed functionality essentially recommends K most interesting tweets for a given user. The ranking should follow scores defined like this:
Score of Tweet = weight_like * number_of_likes + weight_followers * number of followers on the tweet's author + weight_retweet * number_of_retweets
Generation on the fly vs pre-generation
We need to decide whether to generate Home Feed on the fly (when a user requests Home Feed), or we generate it before that.
Pro of pre-generation is that Home Feed is ready when a user asks for it. But it's an expensive thing to do.
Pro of generating it on the fly is to conserve computational and storage resources.
In a system like this, response time is critical. If user observes that it's slow (e.g., it takes a couple of seconds for Home Feed to load), there's a risk the users would jump to our competitors.
As such, we would pre-generate Home Feed.
To store 20 most interesting tweets for each Daily Active User, we would need:
500M DAU * 20 * 512 byte = 5.12TB.
Modern cache system can store substantial amount of data. If it can store 128GB per server, ~40 cache servers would be able to store them.
Home Feed Generator
Home Feed Generator service pulls new Tweets from the Message Queue. It calculates the score for the tweet. It stores top K tweets list for each user. Redis Sorted Set, based on a skip list, provides an efficient data structure for storing the list.
Sorted Set approach is better than using a heap for this functionality. We would have to pop K items from the heap every time we want to get the list. With Sorted Set, we do not have to destroy the list.
Trade offs/Tech choices
Failure scenarios/bottlenecks
[Mid-level deep dive topic]
The aforementioned Home Feed Generator approach works well - until somebody with millions of followers (e.g. a celebrity) tweets.
Such a person has enormous followings, so their tweet can have a high score on every user's Home Feed. In this case, Home Feed Generator would update all 500M DAU's top-K list, overwhelming the cache system.
As such, Home Feed Generator should not add tweets from such a person (e.g. > 1,000 followers) to users' Home Feed. Instead, Home Feed Service should pull from a list dedicated to tweets by popular people. It can do so once in a while (e.g. 5 minutes) to reduce the load.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?