System requirements
Functional:
List functional requirements for the system (Ask interviewer if stuck)...
user authentication and authorization
posting tweets
getting a timeline
following a user
mobile and web clients
Non-Functional:
List non-functional requirements for the system...
330 million monthly active users
186 million daily active users
500 million tweets per day
high availability/reliable
eventually consistent
performant/scalable
secure
Capacity estimation
Estimate the scale of the system you are going to design...
10:1 read write ratio
500 million tweets per day
5,800 tweets per second
58,000 tweets read per second
peak traffic is 29,000 tweets written per second and 290,000 tweets read per second
tweet size is about 100 bytes
50gb of storage space per day
API design
Define what APIs are expected from the system...
POST /api/v1/tweet/
{
title: string
message: string
}
GET /api/v1/timeline/
POST /api/v1/users/{user_id}/followings
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
user
id: string
username: string
email: string
tweet
id: string
userId: string
title: string
message: string
date: string
follower
followerId: string
followeeId: string
For the users and tweets, while a sql database would be good for the representing the relationships between these 2 entities. A NoSql solution would scale much better for the amount of traffic we would need. Horizontal scaling is a must for this type of system. So we can use a key/value type of store for users and tweets.
For the followers, it might be a good use case for a graph database. Graph databases excel at storing relationships between entities in a following/follower relationship and scales very well.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...
For redundancy, availability and scalability, we should use a load balancer in front of our services to spread load but to also allow requests to be mad if a particular service is down.
We can use an API gateway for rate limiting but to also deal with authentication and authorization of requests.
Because of our scalability, we should also consider segregating our requests into separate services. A follow service, a tweet service, and a timeline service.
We want a key/value tweet database, a key/value user database, and a follower database using a graph database.
When a tweet is made, we want to use a message queue to handle sending that tweet out to all of a users followers and updating their timeline via a tweet worker service.
We can put a cache in front of the tweet database, in front of the user database, and a cache for the users timeline.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
For following a user, we can use the follow service to update our graph database whenever a user follows or unfollows another user.
When a new tweet is made, we can save the tweet in our tweet database and also use our tweet service to deliver posts to our followers. We will retrieve a list of our followerIds using the graph database and then fetch the user from our user database. If the cache in front of user database exists, we will use that first. Then we can add an event to our message queue and a tweet worker service will handle the request to update a users timeline.
When a user requests their timeline, we will first look in the users timeline cache. If the timeline cache is empty, we will have to generate their timeline by looking at who they are following and retrieving recent tweets made.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
For following a user, we can use the follow service to update our graph database whenever a user follows or unfollows another user.
When a new tweet is made, we can save the tweet in our tweet database and also use our tweet service to deliver posts to our followers. We will retrieve a list of our followerIds using the graph database and then fetch the user from our user database. If the cache in front of user database exists, we will use that first. Then we can add an event to our message queue and a tweet worker service will handle the request to update a users timeline.
When a user requests their timeline, we will first look in the users timeline cache. If the timeline cache is empty, we will have to generate their timeline by looking at who they are following and retrieving recent tweets made.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
trading off full ACID compliance by using NoSQL key/value databases instead of using SQL. The data is less consistent than using SQL but the benefit is the ease of scaling nodes for our NoSQL databases.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
We probably don't want to do precompute tweets for users that have many followers like a celebrity as this could become quite expensive for our and will probably want celebrity tweets to be fetched on demand.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?