System requirements
Functional:
- Post a tweet
- View tweets
- Follow other users tweet
- Put tweets in favorites
Non-Functional:
- Scalable: the system should be able to handle large traffic, especially during peaks of traffic (for instance when a famous user post a tweet)
- Latency must remain acceptable (requests should be answered in < 1s)
Capacity estimation
- 500M daily active users
- Users post between 1 to 10 tweets each day (~2.5B tweet posted daily)
- Users views on average 20 tweets per day (10B tweets viewed daily)
- 12.5B requests each day
API design
GET /tweet/{id}
POST /tweet
POST /tweet/{id}/favorite
GET /feed
GET /users/{id}
GET /users/{id}/tweets
POST /users/{id}/follow
POST /users/{id}/unfollow
Database design
Tweet
- [PK] ID INT
- [FK] UserID INT
- TextContent VARCHAR
- CreatedAt DATETIME
- UpdatedAt DATETIME
Users
- [PK] ID INT
- Login VARCHAR
- Email VARCHAR
- Password VARCHAR
Feed
- [FK] UserId INT
- [FK] TweetID INT
Followers
- [FK] UserID INT
- [FK] FollowerUserID INT
We should use a partitioning strategy to avoid hot keys (famous users can make partitions unbalanced)
Adding a salt to the userid can help prevent this
High-level design
Request flows
Detailed component design
Trade offs/Tech choices
- We can use eventual consistency to improve scaling
- Tracking followers can be done using a graph database so we can have a better understanding of relationships between users even if they are not directly connected
Failure scenarios/bottlenecks
- A famous users (lot of followers) posting a tweet can be a bottleneck because there will be numerous feeds to update. In our design the feeds are updated asynchronously to handle this case