Design Twitter - System Design

System requirements

Functional:

Users can:

CRUD tweets
share tweets
track updates of other users (newsfeed)
like tweets

User following, registration, auth, etc. are implied and out of scope for the purposes of this discussion

Non-Functional:

System should have low latency on reads (< 100ms)
Scale to support 300M active users generating a median of 5 tweets/day (1.5B daily), ~10 * 1.5B newsfeed requests/ day
Tweets are read heavy and system to be able to support high read throughput (10:1) (100:1 or even higher depending on whether we're referring to requests or tweets as the unit here)
Tweets can have spikes in "likes," meaning the system needs to handle high write throughput for popular tweets

Capacity estimation

Feed cache:

300M Active Users * 500 max tweets in cache/user * (200 chars/tweet + 8 char prof pic url + 15 char username) = 3 * 10^2*10^6*5*10^2 * 250=15*25*10^11 = (250+125)*10^11=375*10^11 = 37.5TB (in reality less, because we can also set cache entries to expire after a week, meaning most feed caches won't reach max capacity)

API design

POST /api/v1/tweet -> tweet id

{

text

sharing?: {tweet id}

}

PUT /api/v1/tweet/{tweet id}

{

text

}

PUT /api/v1/like/{tweet id}

{

like: bool

}

DELETE /api/v1/tweet/{tweet id}

GET /api/v1/feed/{next id} -> list of tweets along with tweet metadata (likes, user info)

Database design

Core entities:

Users
Tweets
Likes

Tweets table:

Text

User ID

Shared Tweet ID

Likes table

Tweet ID

User ID

High-level design

Creating, Updating, Deleting tweets:

The tweet service will be a basic CRUD app. The postgres DB will be sharded by tweet UUID via consistent hashing.

Liking tweets:

"likes" can be routed to a Kafka queue to handle "hot" tweets. Spark streaming workloads can process likes in mini batches, partitioned by tweet ID. Spark will query the tweet service to see if the tweet ID exists, and if it does, write to the likes DB. The postgres likes DB will be sharded by tweet ID via consistent hashing. There's an unlikely, but still possible, race condition where Spark could check if a tweet exists, get a "yes" response, but then immediately after the tweet could be deleted while the likes db for that tweet is being written to. This should be relatively infrequent though, wasting a negligible amount of space. This situation could also be handled by running a periodic cleanup job.

Generating Feed:

The "feed" service returns a pre-populated, cached feed of up to 500 of the most recent tweets a user follows (self auto-followed), ordered by creation time. The service will store the 500 tweets in a Redis KV store and return 20 tweets at a time to a user as a paginated result set. Each result returned to the user will include an ID which will be used to fetch the next batch of 20 from the redis cache (if such a batch exists).

The feed cache itself will be populated in the background by Apache Flink workers that will consume tweets updates from CDC, identify followers of the tweet by querying the "follower" service, and push to their respective feed caches along with user and like info associated with each tweet queried from the user and likes services respectively. The cache content itself will use a Redis sorted set where tweets are sorted by the order the Flink worker pushed them to the cache, via a monotonic counter maintained on Flink corresponding to the order tweet updates were received

Request flows

CRUD Tweets:

Client calls CRUD APIs
Call passes through LB -> API GW -> Tweet Service
Tweet Service processes API request and returns appropriate HTTP response to client.
In background, Flink detects update via CDC and updates feed caches accordingly

Like Tweet:

Client calls like API
Call passes through LB -> API GW -> Kafka, then client returns 200 response
Worker consumes Kafka event(s), and processes them in batches

Read Feed:

Client calls feed API, with offset (unless initial query)
Call passes through LB -> API GW -> Feed service
Feed service queries Redis cache with offset (if applicable), and returns results along with next offset (if there are more results)

Detailed component design

Discussed in detail above.

Trade offs/Tech choices

Discussed in detail above.

Failure scenarios/bottlenecks

Discussed in detail above.

Future improvements

Discussed in detail above.