My Solution for Design Twitter with Score: 9/10
by zephyr_cosmos528
System requirements
Functional:
- read other's posts, user's home page will show the top k posts from people they follow and the most popular posts
- create posts with <= 140 characters, focus on text only
- follow others
- favorite others' posts
Non-Functional:
- low latency
- highly available
- consistency - eventual consistency should be enough, acceptable that a post is visible to user A faster than to user B
Capacity estimation
500M ADU, create one post per day, read 10 pages on home page per day, like 5 posts per day. Assume each user follow 100 users.
Write: 500M/10^5 = 5k TPS
Read: 10 * 5k = 50k TPS
Storage:
150b * 500M = 75GB/day = 27TB/year
API design
- create_post(user_id, content)
- read_post(user_id, offset, limit)
- follow(follower, followee)]
- unfollow(follower, followee)
- like(user_id, post_id)
Database design
Query pattern:
- get the most recent posts from the people a user follows
- get all users followed by another user
- get the most popular posts
User table
- user_id PK
- name
Post
- post_id (id+timestamp) PK
- user_id
- content
- created_at
- last_updated_at
- num_of_like
Follow
- follower_id
- followee_id
- created_at
Favorite
- user_id
- post_id
- created_at
High-level design
All the requests go through a load balancer to App server. To scale, we can separate read requests from writes. The read servers handle read post requests while the write servers deal with all post creation/favorite, user sign up etc.
The data is saved to redis and database.
A count service scan through the database every 5 min to calculate the number of likes for each post.
Request flows
- Create post - request is routed to a write app server, then the post is saved into database and redis. In redis, the post is fanned out to all their followers' list.
- Get home page - The request is routed to a read app server, the server reads all the posts from the redis with key = current user id, and load the home page.
- Favorite post - request goes to a write app server and add a record in the Favorite table
- Follow others - request goes to a write app server and add a record in the Follow table, it also updates redis.
Detailed component design
Redis
Get home page is the feature that creates the most traffic. To speed up the process, we use redis to cache the data.
- <user_id, List<post_id>> user timeline, contains all the posts the user creates.
- <user_id, List<post_id>> home timeline, helps to load all the posts in home page quickly, the list size can be limited to something like 100 with default TTL = 5 days. Once the list is full, we remove the last item.
- <post_id, post>
- <user_id, List<follower_id>> allows us to find out all the people a user follows, we limit the size to 200, evict the last if full.
Count service
The num_of_like in the Post table might become a bottleneck if we do real-time calculations. If we don't do calculation, we have to lock the row when a post is liked by others. To solve that, we dedicate a separate service to do the calculation async.
Trade offs/Tech choices
- Choose NoSQL over SQL given the huge size of data and the low latency requirements.
Failure scenarios/bottlenecks
For hot users, fan out might be inefficient due to the huge number of followers. We can cache a list of celebrities in redis, when creating home timeline for user A, we check if s/he follows any celebrities, if so, fetch the latest posts from the celebrity's user timeline and add those along with the home timeline lists.
Future improvements
If we want to notify the users with new posts available from the users they follow, we can connect the user with the read app server via websocket to allow bi-diretional communication. Then we add a message queue for each user in the flow. Once a new post is created, the post is published to all the followers' queue, the read server then pushes a notification to the user's device on new posts available.