Design Twitter - System Design

Requirements

Functional Requirements:

Allow users to post tweets of upto 150 characters
Allow users to follow other users
Allow users to like tweets
Allow users to view their home feed and timeline
Home feed shows top tweets based on likes.

Scale estimates:

1B users
100M DAU
Each user 5 tweets daily ~ 500M tweets per day = ~6-7 tweets/sec
Peak traffic - 5x -> 30-35 tweets/sec
Each users reads home feed 100 times daily -> 10B timeline reads daily -> ~120K reads/sec
Peak read traffic - 5x -> 600k reads/sec

Non-Functional Requirements:

Low Latency for home feed loading (<100 ms ideally)
High Availability
Highly scalable - to support the massive scale
Eventual consistency for timeline/home feed
High Read throughput - support the ~10B reads per daily especially during high peak traffic

API Design

Data Models

Users - Store user profile info

username

Tweet - stores tweet content

UserID

Content

createdAt

User -> Tweet = 1:many relationship

TweetLikes - stores tweets likes

UserId

TweetId

User <-> TweetLikes -> Many:Many relationship

Follow - stores user follow info

followerId

followeeId

followedAt

User -> Follow (1:Many relationship) - One user can follow many users or one user can have many followers

APIs

Create tweet

POST /users/{userId}/tweets

req - {

content

}

Like tweet

POST /tweets/{id}/likes

{user id comes from Auth token)

Follow other user

POST /users/{id}/follow

{follower id comes from auth)

Get user feed

Get /feed?cursor=X

userid froma auth token
cursor applies pagination

Get users timeline

Get /users/{id}/tweets

High-Level Design

API Gateway

Acts as an entry point for all the requests, Implement:

Authentication
Rate limiter - uses token bucket strategy with rate limiter implemented on API key/IP

Load Balancer

Routes request to multiple instances of tweet, feed and user services

Tweet Service

Responsibilities

Create and save tweets
Save tweet likes
Publish TweetCreated event

Feed/TimeLine service

Responsiblities

Generates and returns user feed and timeline
Fetches feed from redis
- If cache hit -> returns response
- if cache miss -> goes to DB, generates feed/timleine, saves to redis and returns response
Feed generation:
- User feed is generated and ranked i.e. sorted by likes, so top liked tweets with recency are first in feed
- In redis we only store 200 records that are ranked
Flow:
- User reqeusts their feed
- Feed service checks redis
- Redis cahce miss
- Feed service goes to DB, fetches top 200 tweets of all users' followers sorted by likes
- Saves in redis
- returns top 20 tweets
User now scrolls past 20 tweets
- Anothjer request hits the feed service this time, the cursor value is 21
- Feed service gets the cache from redis
- Returns 21-40 records
Here we do "best-effort-pagination" and are ok with evetual consistency i.e. Feed order is not perfectly stable during scrolling so some minor duplications, skips or shifts can happen

User service:

Responsiblities

Saves user creations
Saves user follows

MessageQueue:Kafka

Acts as an async processor and handles the tweet created event
Uses UserId as partitionKey

Feed Fanout service

Consumes TweetCreated event from Kafka
Applies fanout strategies i.e. fanout on read and fanout on write to update user feed

Redis

Acts a cahcing layer to support high read throughput
Stores users feed
- stores feed:user123 -> feed[]
- TTL is 15min
Gets updated by feed fanout service so user feed is updated whenver the user's follower creates tweet, so it adapts eventual consistency

Database

Formain persistence storage we use SQL and NoSQL, hybrid approach
For Users and follow data models consitency must be strong and relationship is critical so we store taht in SQL
For tweets, likes we need high scalability over consitency and relationship, so we use NoSQL for it

Detailed Component Design

We will now deep dive into the following topics

Handling Massive scale - Scaling strategies

The system expects almost nearly 500M writes and 10B reads daily, in peak traffic we can expect 5x more.
To handle this we need scaling strategies for our servers as well as databases

Servers + Load balances - Horizontal scaling

For all our services - Tweet, feed, feed fanout, user service, we create multiple instances of those services
These services are then put behind Load balancer
Load balancer distributes load to different servers, making the sytem able to handle higher scale efficiiently.
We can use Random picking strategy for load balancer which is suprisingly very efficent for high scales like our system

Dabatase - Sharding

With nearly 500M writes daily, single DB server cant handle it at all. so we shard our db servers.
We will shard our NoSQL db which stores the tweet info
We use Userid as partition key for sharding, this ensures, all tweets of user are in same shard

Database - Replicas

Use read replicas to make reads efficient and faster

So using thes scaling strategies, we can make our system able to handle higher traffic efficiently

Hot tweet problem

Problem: A tweet goes viral/celebrity tweets something that is getting lots of engatement, i.e. millions of likes, feed reads, this creates hot key issue.

Solution:

-> Redis + Kafka for likes storing and total likes updates

We will Redis to store tweet metadata, so all recent tweet that are created will be cached in Redis with TTL of 12 hours.
- Metdata to store:
  - Content
  - userId
  - TimeStamp
  - LikeCount
Updated Tweet like flow:
- When there are lots of likes coming in for a tweet, the tweet service instead of handling storing like on its own, it will create publish a TweetLiked event
- This tweetLiked event is then conmsued by Like processor service, the service will:
  - Store the like record in NoSQL (source of truth)
  - Update Redis like count -> if the tweet cache is found in redis

The HLD diagram is upated to reflect this solution.

So when the system gets a lot of traffic for a hot tweet this solution ensures:

Low latency -> Kafka async processing publishes events and lets downstream serviecs handle the processing of events
High read throughput -> The hot tweet metadata is stored in redis, so the user feed for that tweet which will be read by millions of user is always fetched from redis ensuring reads are faster

Fanout strategies

-> When a tweet is created we update the feed for all users who follows the user who has tweeted. but this create issues when a celebrity with millions of user tweets, if we were to write to all the million users it becomes very inefficient, but to tackle this we apadt the hybrid appraoch

For normal users -> fanout-on-write
- When a tweet is created, add that tweet to all followers feed
For Celebrity users -> fanout-on-read
- When celveriuty posts tweet -> add that tweet to a seperate redis store
  - celebTweets:{celebId} → one sorted set per celeb all their tweets, scored by timestamp
- Then when user requests feed, use this store to compute their feed

Entire flow for feed service:

user requests feed
Fetch the cached feed (tweetIds) from normal user -> tweetIds
Get user celeb list -> this could be cahced too for faster reads
for each celeb -> Mget or pipeline fetch top celebrity tweets that user follows -> only fetch latest 5 tweets of the celebrity
- We can cap max celebrity follow limit for eg: A user can follow a max of 1000 celebs account, the number could be configured as required
- So even lets say a user has 1000 celebs following, whne they request feed, the read will take ~2-5ms which is fine.
Once we have all tweetIds, we batch fetch tweet metadata, again from redis, we can use Mget or pipeline fetch
Rank and merge tweets and return response