Requirements
Functional Requirements:
- Allow users to post tweets of upto 150 characters
- Allow users to follow other users
- Allow users to like tweets
- Allow users to view their home feed and timeline
- Home feed shows top tweets based on likes.
Scale estimates:
- 1B users
- 100M DAU
- Each user 5 tweets daily ~ 500M tweets per day = ~6-7 tweets/sec
- Peak traffic - 5x -> 30-35 tweets/sec
- Each users reads home feed 100 times daily -> 10B timeline reads daily -> ~120K reads/sec
- Peak read traffic - 5x -> 600k reads/sec
Non-Functional Requirements:
- Low Latency for home feed loading (<100 ms ideally)
- High Availability
- Highly scalable - to support the massive scale
- Eventual consistency for timeline/home feed
- High Read throughput - support the ~10B reads per daily especially during high peak traffic
API Design
Data Models
Users - Store user profile info
Id
username
Tweet - stores tweet content
Id
UserID
Content
createdAt
User -> Tweet = 1:many relationship
TweetLikes - stores tweets likes
UserId
TweetId
User <-> TweetLikes -> Many:Many relationship
Follow - stores user follow info
followerId
followeeId
followedAt
User -> Follow (1:Many relationship) - One user can follow many users or one user can have many followers
APIs
Create tweet
POST /users/{userId}/tweets
req - {
content
}
Like tweet
POST /tweets/{id}/likes
{user id comes from Auth token)
Follow other user
POST /users/{id}/follow
{follower id comes from auth)
Get user feed
Get /feed?cursor=X
- userid froma auth token
- cursor applies pagination
Get users timeline
Get /users/{id}/tweets
High-Level Design
API Gateway
Acts as an entry point for all the requests, Implement:
- Authentication
- Rate limiter - uses token bucket strategy with rate limiter implemented on API key/IP
Load Balancer
- Routes request to multiple instances of tweet, feed and user services
Tweet Service
Responsibilities
- Create and save tweets
- Save tweet likes
- Publish TweetCreated event
Feed/TimeLine service
Responsiblities
- Generates and returns user feed and timeline
- Fetches feed from redis
- If cache hit -> returns response
- if cache miss -> goes to DB, generates feed/timleine, saves to redis and returns response
- Feed generation:
- User feed is generated and ranked i.e. sorted by likes, so top liked tweets with recency are first in feed
- In redis we only store 200 records that are ranked
- Flow:
- User reqeusts their feed
- Feed service checks redis
- Redis cahce miss
- Feed service goes to DB, fetches top 200 tweets of all users' followers sorted by likes
- Saves in redis
- returns top 20 tweets
- User now scrolls past 20 tweets
- Anothjer request hits the feed service this time, the cursor value is 21
- Feed service gets the cache from redis
- Returns 21-40 records
- Here we do "best-effort-pagination" and are ok with evetual consistency i.e. Feed order is not perfectly stable during scrolling so some minor duplications, skips or shifts can happen
User service:
Responsiblities
- Saves user creations
- Saves user follows
MessageQueue:Kafka
- Acts as an async processor and handles the tweet created event
- Uses UserId as partitionKey
Feed Fanout service
- Consumes TweetCreated event from Kafka
- Applies fanout strategies i.e. fanout on read and fanout on write to update user feed
Redis
- Acts a cahcing layer to support high read throughput
- Stores users feed
- stores feed:user123 -> feed[]
- TTL is 15min
- Gets updated by feed fanout service so user feed is updated whenver the user's follower creates tweet, so it adapts eventual consistency
Database
- Formain persistence storage we use SQL and NoSQL, hybrid approach
- For Users and follow data models consitency must be strong and relationship is critical so we store taht in SQL
- For tweets, likes we need high scalability over consitency and relationship, so we use NoSQL for it
Detailed Component Design
We will now deep dive into the following topics
Handling Massive scale - Scaling strategies
- The system expects almost nearly 500M writes and 10B reads daily, in peak traffic we can expect 5x more.
- To handle this we need scaling strategies for our servers as well as databases
Servers + Load balances - Horizontal scaling
- For all our services - Tweet, feed, feed fanout, user service, we create multiple instances of those services
- These services are then put behind Load balancer
- Load balancer distributes load to different servers, making the sytem able to handle higher scale efficiiently.
- We can use Random picking strategy for load balancer which is suprisingly very efficent for high scales like our system
Dabatase - Sharding
- With nearly 500M writes daily, single DB server cant handle it at all. so we shard our db servers.
- We will shard our NoSQL db which stores the tweet info
- We use Userid as partition key for sharding, this ensures, all tweets of user are in same shard
Database - Replicas
- Use read replicas to make reads efficient and faster
So using thes scaling strategies, we can make our system able to handle higher traffic efficiently
Hot tweet problem
Problem: A tweet goes viral/celebrity tweets something that is getting lots of engatement, i.e. millions of likes, feed reads, this creates hot key issue.
Solution:
-> Redis + Kafka for likes storing and total likes updates
- We will Redis to store tweet metadata, so all recent tweet that are created will be cached in Redis with TTL of 12 hours.
- Metdata to store:
- Content
- userId
- TimeStamp
- LikeCount
- Metdata to store:
- Updated Tweet like flow:
- When there are lots of likes coming in for a tweet, the tweet service instead of handling storing like on its own, it will create publish a TweetLiked event
- This tweetLiked event is then conmsued by Like processor service, the service will:
- Store the like record in NoSQL (source of truth)
- Update Redis like count -> if the tweet cache is found in redis
The HLD diagram is upated to reflect this solution.
So when the system gets a lot of traffic for a hot tweet this solution ensures:
- Low latency -> Kafka async processing publishes events and lets downstream serviecs handle the processing of events
- High read throughput -> The hot tweet metadata is stored in redis, so the user feed for that tweet which will be read by millions of user is always fetched from redis ensuring reads are faster
Fanout strategies
-> When a tweet is created we update the feed for all users who follows the user who has tweeted. but this create issues when a celebrity with millions of user tweets, if we were to write to all the million users it becomes very inefficient, but to tackle this we apadt the hybrid appraoch
- For normal users -> fanout-on-write
- When a tweet is created, add that tweet to all followers feed
- For Celebrity users -> fanout-on-read
- When celveriuty posts tweet -> add that tweet to a seperate redis store
- celebTweets:{celebId} → one sorted set per celeb all their tweets, scored by timestamp
- Then when user requests feed, use this store to compute their feed
- When celveriuty posts tweet -> add that tweet to a seperate redis store
Entire flow for feed service:
- user requests feed
- Fetch the cached feed (tweetIds) from normal user -> tweetIds
- Get user celeb list -> this could be cahced too for faster reads
- for each celeb -> Mget or pipeline fetch top celebrity tweets that user follows -> only fetch latest 5 tweets of the celebrity
- We can cap max celebrity follow limit for eg: A user can follow a max of 1000 celebs account, the number could be configured as required
- So even lets say a user has 1000 celebs following, whne they request feed, the read will take ~2-5ms which is fine.
- Once we have all tweetIds, we batch fetch tweet metadata, again from redis, we can use Mget or pipeline fetch
- Rank and merge tweets and return response