Design Twitter - System Design

Requirements

Functional Requirements:

Allow users to tweet messages up to 140 characters.
Enable users to follow other users.
Allow users to like tweets from other users.
Display tweets from followed users in the home feed.
Show top K popular tweets in the home feed based on likes and followers.

Non-Functional Requirements:

highly available for users to see posts
Able to scale among regions and millions of users, eg. 500M DAU
User has to see tweets quickly. When user opens home feed, the first 10 tweets should show up within 500ms
Durable data

Capacity Estimation

Estimate the scale of the system. Consider daily active users, read/write ratio, storage requirements, bandwidth, and any relevant QPS calculations...

API Design

REST APIs

POST /tweet

-- messages

-- user_id

-- topic

-- timestamp

POST /follow/{user_id}

POST /like/{tweet_id}

GET /tweet/{user_id} # get tweet from a user

GET /toptweets/{k} # get top k tweets

High-Level Design

The request starts from the client and goes to API gateway, then it goes to tweet service (for APIs like post a tweet, add likes to a tweet, get all tweets from a user) or user service (for APIs about the user, like follow a user)

When the user posts a tweet, the request goes to tweet service through API gateway, and the parsed data from tweet service will be stored in the DynamoDB database's tweet table, with clear info of tweet id, messages, user id, timestamp and etc.

When the user opens the feed, the feed service will return several most recent posts (sorted by timestamp) to the user from the user's followed people.

Database Design

DynamoDB

Tweet table:

tweet id (primary key)

user id

topic

message

timestamp

likes amount

user likes the tweet []

Follow table

user id

followed user id

created at

User table

user id (primary key)

followers []

Detailed Component Design

The feed generated in this way: The Feed service fetches data from follow table for a specific user to get all the followers. Then for each follower, it fetches the tweet from tweet table for the exact follower. Then sort them by timestamp, the most recent one will be the first, the result goes back to API gateway and to the client. (also pagination will help, eg. return 10 tweets a time, no hard load to the service and database)
Normally when an account post a new tweet, a publisher-subscriber event is triggered and all the follower account should see the new tweet in their feed; however for some popular accounts may have a bunch of followers, we need to add some async workers to handle it, like each worker publish the new post to a range of followers (where the partition id is the user id) and then update their feed. Also, we have to know the feed update is not instant, some followers may see the post earlier and some may see the new post later.
Redis can be used to store top-k tweets, so for the user want to see top-k tweets, just goes to redis first, if a hit get, the result just returned; if a miss, then the service request goes to database, and then an async write to the Redis to update the data.