Design Twitter - System Design

Requirements

Functional Requirements:

Allow users to tweet messages up to 140 characters.
Enable users to follow other users.
Allow users to like tweets from other users.
Display tweets from followed users in the home feed.
Show top K popular tweets in the home feed based on likes and followers.

Non-Functional Requirements:

Can scale to billions of users to simultaneously post, like, and comment
New posts can be seen by followers within 1 minute after posting
Store all historical data of billions of users
UI response is instantaneous (for example, like immediately increases like count on UI, although actual backend data may be delayed)

API Design

Backend:

post(message: string, userId: number)
like(userId: number, postUserId: number, postId: number)
comment(userId: number, postUserId: number, postId: number, content: string)
deletePost(userId: number, postId: number)
deleteComment(userId: number, postId: number, commentId: number)
dislike(userId: number, postId: number)
getPosts(userId: number, page: number)
getTopK(region: string)
follow(followee: number, follower: number)
unfollow(followee: number, follower: number)
getHomeFeed(userId: number)

High-Level Design

Main components:

Frontend UI, boundary = make REST api to fetch data or write data
Backend REST API server boundary = handle REST api calls, cache request data, orchestrate database queries, compose home feed based on query result, etc.
Distributed Relational Database: boundary = persistent data storage, aceept database queries, communication cross regions for data consistency

## Read path

Frontend asynchrounously calls REST API while displaying skeleton -> Backend checks cache or queries database -> database retrieves and returns corresponding data -> Backend organizes query results and send back to frontend -> frontend update view with new data

## Write path (make a like)

Frontend instantly updates UI, play a like animation, then make async REST API call -> backend relays to database with a query "IN LIKE_TABLE INSERT commentId by userid" -> database negotiate with remote peers throgh raft protocol, retry until successful, then commit

Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.

## sharded database.

Use raft algorithm to keep comment and like data consistent across multiple copies around the globe. Also add sharding based on userId to scale up serving capability. In the CAP theorem, we are more towards giving up consistency, since it's less valued for users.
Posts are less prone to data race. Since a single user can't make multiple posts at the same time, not to mention posting from different regions at once. So a sharded database + periodic regional syncing should be enough
database schema design:

two main table to store posts and users
a comment table of posts, with commentId as primary key and userId as secondary key
similarly a like table, again mapping commentId and userId
userId -> follower table + follower -> userId table

All of the tables above should be sharded for scalability. Because usage patterns are highly uneven: some posts receive a flood of likes/comments while most remain dormant

## Feed update

During a feed update, the backend server does the heavy lifting. It needs to make multiple queries to gather posts from followees that the current user follows, and run a ranking algorithm to determine the order. This result also should be cached and only invalidate upon new posts from any of the followees. Or to make it more perfromant, we can even append new posts in the front of the list, and run the ranking algo even less often.

## Latency consideration

Use caching agressively on hot paths such as like counts or comment history. We should have a separate table to store like counts of posts, instead of gathering counts from the postId -> like table every time. The userId->interaction mapping can be updated more often, since it's sharded and with few data races. But the like counter is synced less often, and the lag is usually tolerable for users. It's extremely expensive to update a single global counter on a viral post.

The goal is:

User-visible state: near real-time, possibly strongly replicated
Public counts/ranking/search: eventually consistent, seconds to minutes
Analytics: minutes to hours

## TopK ranking system

We should have a separate service that periodically rank the hottest new posts. There is less latency constraint on this functionality, and it's natural to decouple it from the REST API server. To rank the posts, we would need to make data queries to like counter and comment counter. We can even run some machine learning algorithm to analyze the velocity of like/comment increase to identify viral posts to recommend to the users.