System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Compose and share tweets
- Follow other users
- Look at tweets
- Favorite tweets
Non-Functional:
List non-functional requirements for the system...
- Highly available
- Scale horizontally to meet growing user needs
- Minimal latency when viewing tweets
Capacity estimation
Estimate the scale of the system you are going to design...
Total Users: 500M
DAU: 100M
Avg. Daily Tweet: 5
500M Daily Tweets per day
1.4 TBs storage per day
Total Reads per Day = 100M * (300 followed users * 1200 tweets/second)
36 * e4 * 100 * e6
36 * 10^12 = 36 Trillion Tweets per day
High read traffic and high storage requirements
API design
Define what APIs are expected from the system...
Use RESTful API to create a stateless system that can scale horizontally
Authenticate with JWT token and get userInfo
GET
/tweets
getTweets(tweetId)
POST
/tweets
postTweet(tweetId)
favoriteTweet(tweetId)
/users
followUser(userId)
DELETE
/tweets
deleteTweet(tweetId)
unfavoriteTweet(tweetId)
/users
unfollowUser(userId)
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
Entities
Tweets, Users, Follows, Favorites
Tweets
tweetId, createdAt, text, userId
Users
userId, createdAt
Follows
followId, userId, followedUserId
Favorites
favoriteId, userId, tweetId
Tweets will sharded based on a combination of timestamps and primary key id. Relationships like favorites and follows are their own tables and follows normalization rules.
The database will need consistent hashing for sharding so it does not become imbalanced. We are using SQL because we need a relationship model for following and favorites.
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
There is a client that reaches out to an API Gateway that directs it to a load balanced read and write tweet api service. the write api service writes directly to a leader database. The read api service will read from a write back cache and if there's a cache miss, it will reach out to a read replica.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Read Tweet
- Client reaches out to Tweet API service
- Load Balancer directs to the Read API service
- Then it looks for cache
- when there's a cache miss, it reads from read replica.
Write Tweet
- Client posts to Tweet API Sercice
- Load Balancer directs to Write API Service
- Writes to Leader database
Follows
- User follows users and posts data to Write API
- Data is written to leader database
Favorites
- User favorites tweets and posts to Write API
- Data written to leader database
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
The API Gateway is a managed service and allows us to direct user traffic to the appropriate service.
The write API service will be able to horizontally scale due to the stateless system. If there is a lot of load on the system, then we can just add more machines. Same can be said to the Read API.
Main bottleneck is the database. The database will shard tweets based on tweetId and timestamps which will help if we need to create a timeline for the user. The sharded database will help with the terabyte storage requirements from our capacity estimates. For the Read APIs, they will reach out to the read replicas that are asynchronously replicated. There is a chance to help with the load on the read replicas.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
There might be some consistency issues with read but that is okay since consistency is not critical to reading tweets. However, it might be critical to users that favorite a tweet or even follow a user. It will be a less ideal user experience to read from a replica that does not have the persisted data yet so it could be beneficial to read from the leader database when it comes to the user reading their own favorite and follows that are recent. The cache is to help with hot tweets that are fetched a lot and to ease database load.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Some failure scenarios that can happen are popular users can add a lot of load to the data part of the design. Additionally, with a write back cache, a cache miss can be very expensive for the first time read. The leader database might introduce a lot of latency depending on geographical location. It might be beneficial to consider if multi-leader replication would help with latency.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?
Consider multi-leader replication