System requirements
Functional:
- post tweet
- follow user
- share tweet
- like tweet
- tweets feed
Non-Functional:
- highly available
- highly scalable
- low latency
Capacity estimation
10M DAU
10 M tweets created per day
each tweet 100 bytes
10^3 MB = 1GB data
5 years ~= 200GB to store tweets
1M favorite tweet per day, 16 bytes to record userid + tweeid
16 MB per day
365 * 16 ~= 5 GB per year to store favorite tweets
5 year ~= 30 GB
API design
POST api/relationship/follow
request:
{
auth-header,
body: {
followeeId
}
}
GET api/relationship/followers
response {
[{userId, name}]
}
GET api/relationship/followee
response {
[{userId, name}]
}
POST api/tweets
request:
{
auth-header,
body: {
content,
media
}
}
response:
200
{
tweetId,
tweet: {
content
likes: 0,
}
}
GET api/tweets/{userId}/?page=x&offeset=y
response 200
{
[
tweetId,
tweet:
{content, likes, shares}
]
}
POST api/tweets/like
request {
auth header,
tweetId
}
response 200
{
tweetId,
liked,
likes
}
GET api/tweets/share?tweetId=xxx
response
{
shareUrl
}
GET api/tweets/feed
response
{
[
{
tweetId,
tweet: {
content,
likes,
shares
}
}
]
}
Database design
Friends Table
| User Id | Follower Id | Created At
User Tweets Table
| UserId | TweetId | Content | Likes | Shares |
Liked/Shared Tweets Table
TweetId | UserId | Created At
User Feed Table
UserId | TweetId | Content | Created
High-level design
We use microservice structure to so we could scale individual service separately and separate concerns. It also has better fault tolerant as single service down won't affect the whole system.
Top level, we could have three servives.
Tweet Service is for posting tweet, share or like tweet. Fetch list of tweets (for single user or tweets user liked).
Feed Service is for fetching user tweets feed.
Friend Service is for follow & unfollow users.
On DB level,
- Tweets Table where we store all user's tweets.
- Liked Tweets to store all tweet ids a user liked.
- Tweets Metadata Table to store tweet info of likes count and shared count.
- User Tweets Feed Table to store feed tweets
- Friends Table is to store user and followers
Request flows
- Create Tweet and retrieve tweets
When user creates, they post tweet with content to backend, the tweets service will record tweet in the Tweets Table. And to retrieve tweets, we return user tweets in pagination.
When we store the tweet in Tweets Table, the tweet will also fan out to all followers' and record in user tweets feed table.
- For Like Tweet
User clicks like button and it's reflected on the UI part immediately. We will insert new record in the Likes Tweet Table and asynchronously update the Tweet Table's likes count.
- For Share Tweet
The process is similar to like Tweet. For the share url or share to app, we could either have fixed url format in app or have server generated url, depending wether we need to track source and the access control.
- For follow & unfollow
User could send follow/unfollow request to friends service. Would add/remove record to the Friends DB.
- For user tweets feed
We retrieve feed from the user tweets feed DB.
Detailed component design
For Friends Table, we could use SQL DB (e.g PostgreSQL). Even we don't require ACID to store friend relationship record, SQL is still useful when we try to perform some complex query. It also has mature ecosystem.
For User Tweets Table and User Tweets Feed Table, it needs high write and read output. We could use NoSQL DB like Cassandra, use UserId as partition key and Tweet Id as cluster key. We could use timestamp + UUID to composite the Tweet Id, this way the tweets for each users are sorted by time.
For Liked/Shared Tweets Table, we could use Dynamo DB, primary key as Tweet Id as Parition Key + (Create_At, User Id) as Sort Key.
For Cache, we could Redis.
Friends relationship data we could use Redis Set.
User Tweets and Feeds could use Zset.
For User Tweet Fan out, since the user's followers list could be big, we could use MQ. The message is basically {userId, tweetId, content} and consumer would get the message and add the tweet to each follower's Feed Table. If the follower list is long, we could split the task, set up another MQ where the msg would be {userId, tweetId, content, followerIds: []}, each task would contain 100 follower ids and gradually update feed for all followers.
Trade offs/Tech choices
We choose SQL DB (PostgreSQL) to store the friends relationship as we assume the write is relatively low on Friends Table. We could also track DAU and friends data easily. But if we want to scale to support more users, we need to shard and manipulate manually.
For Tweets related table, we use NoSQL DB to support high read & write volume. They suport scale by nature. It also means we need to accept eventual consistency.
For Tweets Table & User Feed Tweets Table, we use user id as parition key, it makes it easier to fetch all feeds/tweets for a user, but could also lead to hot partition. We need to mark the user as celebrity and divide their tweets into multiple paritions. Each parition has a new partition key as UserId_ShardX and when query, we dynamically add shards depending on the user's tweets count in the newest parition.
For the Tweet Feeds, we use a push model where we store user's feeds and update whenever followee post new tweets. But the process would be very time consuming if the user has a lot of followers. We could combine push and pull model for this case. For normal users, they push the new tweets to follower's feed table. For celebrity, user needs to pull from their tweets.
Failure scenarios/bottlenecks
DB could go down. For SQL DB, we could add replicas, and use slave to catch up and upgrade to new master when master is down.
For NoSQL DB, data is replicated to multiple nodes by nature.
Redis cache could also be broken, we could also add replicas to increase fault tolerant.
For hot partition issue in Tweet Table and celebrity fan out tweets issue, it has been covered in Trade offs/Tech choices.
Future improvements
- We haven't covered case when user unfollow someone, for that case, we would also need update User Feed Table, but that would be an async process. We could use some client filter while backend is hiding the unfollowed person's tweets.
- For disaster recovery, we could have multiple data centers to ensure we could restore from backups.
- We could add monitor system includes:
- Performance Monitoring (response time, request rate, and throughput)
- Error Monitoring (API error rates, service availability, and exception logs)
- Resource Monitoring (CPU, memory, disk, and bandwidth utilization)