Requirements
Functional Requirements:
- Allow users to tweet messages up to 140 characters.
- Enable users to follow other users.
- Allow users to like tweets from other users.
- Display tweets from followed users in the home feed.
- Show top K popular tweets in the home feed based on likes and followers.
Non-Functional Requirements:
- high availability
- fast responses
- design should scale to billions of users
Calculations:
- 10M DAU
- 5M tweets posted x day (58 QPS avg - 174 QPS peak)
- 200M tweets fetched x day (2.3k QPS avg - 7k QPS peak)
- 40:1 read to write
API Design
POST /tweet
body {
tweetId: string,
message: string,
userId: string,
createdAt: date,
}
POST /like/:tweetId
body {
userId: string,
createdAt: date
}
response 202 OK
POST /follow/:userFollowedId
body {
userId: string,
followedAt: date
}
response 202 OK
GET /feed/:userId
response {
tweets: [],
cursor: string
}
High-Level Design
On the high level design we have a gateway that handles all the general gateway things. Then our routes split to a read and write service. The write service handles posting tweets and all interactions like following and liking. Then the write service pushes to the according message queue. At the interaction message queue consumer it batches updates directly to RDS. As for the tweets consumers it updates user feeds in the cache so when the read services gets feed it is immediate.
Detailed Component Design
We have the message queues which I would have them do an adaptive queue management which does FIFO to ensure fairness under normal conditions and then LIFO if under high load. Maybe the tweets message queue I would leave as FIFO in order to ensure tweets get processed sequentially. The tradeoff I made with the message queue is accepting eventual consistency vs realtime but I think it's a good trade off to make here to be able to scale to the billions of users.
I chose a relational database as we'd have a lot JOINs since it's a social network and many fields interconnect to one another. When we need to scale RDS later we have spin up a read replicas as well as sharding by userId when it reaches that scale