System requirements


Functional:

List functional requirements for the system (Ask interviewer if stuck)...


user authentication and authorization

posting tweets

getting a timeline

following a user

mobile and web clients


Non-Functional:

List non-functional requirements for the system...


330 million monthly active users

186 million daily active users

500 million tweets per day

high availability/reliable

eventually consistent

performant/scalable

secure


Capacity estimation

Estimate the scale of the system you are going to design...


10:1 read write ratio

500 million tweets per day

5,800 tweets per second

58,000 tweets read per second

peak traffic is 29,000 tweets written per second and 290,000 tweets read per second

tweet size is about 100 bytes

50gb of storage space per day


API design

Define what APIs are expected from the system...


POST /api/v1/tweet/

{

title: string

message: string

}


GET /api/v1/timeline/


POST /api/v1/users/{user_id}/followings


Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...


user

id: string

username: string

email: string


tweet

id: string

userId: string

title: string

message: string

date: string


follower

followerId: string

followeeId: string


For the users and tweets, while a sql database would be good for the representing the relationships between these 2 entities. A NoSql solution would scale much better for the amount of traffic we would need. Horizontal scaling is a must for this type of system. So we can use a key/value type of store for users and tweets.


For the followers, it might be a good use case for a graph database. Graph databases excel at storing relationships between entities in a following/follower relationship and scales very well.


High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...


For redundancy, availability and scalability, we should use a load balancer in front of our services to spread load but to also allow requests to be mad if a particular service is down.


We can use an API gateway for rate limiting but to also deal with authentication and authorization of requests.


Because of our scalability, we should also consider segregating our requests into separate services. A follow service, a tweet service, and a timeline service.


We want a key/value tweet database, a key/value user database, and a follower database using a graph database.


When a tweet is made, we want to use a message queue to handle sending that tweet out to all of a users followers and updating their timeline via a tweet worker service.


We can put a cache in front of the tweet database, in front of the user database, and a cache for the users timeline.


Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...


For following a user, we can use the follow service to update our graph database whenever a user follows or unfollows another user.


When a new tweet is made, we can save the tweet in our tweet database and also use our tweet service to deliver posts to our followers. We will retrieve a list of our followerIds using the graph database and then fetch the user from our user database. If the cache in front of user database exists, we will use that first. Then we can add an event to our message queue and a tweet worker service will handle the request to update a users timeline.


When a user requests their timeline, we will first look in the users timeline cache. If the timeline cache is empty, we will have to generate their timeline by looking at who they are following and retrieving recent tweets made.


Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...


For following a user, we can use the follow service to update our graph database whenever a user follows or unfollows another user.


When a new tweet is made, we can save the tweet in our tweet database and also use our tweet service to deliver posts to our followers. We will retrieve a list of our followerIds using the graph database and then fetch the user from our user database. If the cache in front of user database exists, we will use that first. Then we can add an event to our message queue and a tweet worker service will handle the request to update a users timeline.


When a user requests their timeline, we will first look in the users timeline cache. If the timeline cache is empty, we will have to generate their timeline by looking at who they are following and retrieving recent tweets made.


Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...


trading off full ACID compliance by using NoSQL key/value databases instead of using SQL. The data is less consistent than using SQL but the benefit is the ease of scaling nodes for our NoSQL databases.


Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.


We probably don't want to do precompute tweets for users that have many followers like a celebrity as this could become quite expensive for our and will probably want celebrity tweets to be fetched on demand.


Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?