Design Twitter - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

Compose and share tweets
Follow other users
Look at tweets
Favorite tweets

Non-Functional:

List non-functional requirements for the system...

Highly available
Scale horizontally to meet growing user needs
Minimal latency when viewing tweets

Capacity estimation

Estimate the scale of the system you are going to design...

Total Users: 500M

DAU: 100M

Avg. Daily Tweet: 5

500M Daily Tweets per day

1.4 TBs storage per day

Total Reads per Day = 100M * (300 followed users * 1200 tweets/second)

36 * e4 * 100 * e6

36 * 10^12 = 36 Trillion Tweets per day

High read traffic and high storage requirements

API design

Define what APIs are expected from the system...

Use RESTful API to create a stateless system that can scale horizontally

Authenticate with JWT token and get userInfo

GET

/tweets

getTweets(tweetId)

POST

/tweets

postTweet(tweetId)

favoriteTweet(tweetId)

/users

followUser(userId)

DELETE

/tweets

deleteTweet(tweetId)

unfavoriteTweet(tweetId)

/users

unfollowUser(userId)

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

Entities

Tweets, Users, Follows, Favorites

Tweets

tweetId, createdAt, text, userId

Users

userId, createdAt

Follows

followId, userId, followedUserId

Favorites

favoriteId, userId, tweetId

Tweets will sharded based on a combination of timestamps and primary key id. Relationships like favorites and follows are their own tables and follows normalization rules.

The database will need consistent hashing for sharding so it does not become imbalanced. We are using SQL because we need a relationship model for following and favorites.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

There is a client that reaches out to an API Gateway that directs it to a load balanced read and write tweet api service. the write api service writes directly to a leader database. The read api service will read from a write back cache and if there's a cache miss, it will reach out to a read replica.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Read Tweet

Client reaches out to Tweet API service
Load Balancer directs to the Read API service
Then it looks for cache
when there's a cache miss, it reads from read replica.

Write Tweet

Client posts to Tweet API Sercice
Load Balancer directs to Write API Service
Writes to Leader database

Follows

User follows users and posts data to Write API
Data is written to leader database

Favorites

User favorites tweets and posts to Write API
Data written to leader database

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

The API Gateway is a managed service and allows us to direct user traffic to the appropriate service.

The write API service will be able to horizontally scale due to the stateless system. If there is a lot of load on the system, then we can just add more machines. Same can be said to the Read API.

Main bottleneck is the database. The database will shard tweets based on tweetId and timestamps which will help if we need to create a timeline for the user. The sharded database will help with the terabyte storage requirements from our capacity estimates. For the Read APIs, they will reach out to the read replicas that are asynchronously replicated. There is a chance to help with the load on the read replicas.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

There might be some consistency issues with read but that is okay since consistency is not critical to reading tweets. However, it might be critical to users that favorite a tweet or even follow a user. It will be a less ideal user experience to read from a replica that does not have the persisted data yet so it could be beneficial to read from the leader database when it comes to the user reading their own favorite and follows that are recent. The cache is to help with hot tweets that are fetched a lot and to ease database load.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Some failure scenarios that can happen are popular users can add a lot of load to the data part of the design. Additionally, with a write back cache, a cache miss can be very expensive for the first time read. The leader database might introduce a lot of latency depending on geographical location. It might be beneficial to consider if multi-leader replication would help with latency.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Consider multi-leader replication