Design Twitter - System Design

Requirements

Functional Requirements:

Allow users to tweet messages up to 140 characters.
Enable users to follow other users.
Allow users to like tweets from other users.
Display tweets from followed users in the home feed.
Show top K popular tweets in the home feed based on likes and followers.

Non-Functional Requirements:

high availability
fast responses
design should scale to billions of users

Calculations:

10M DAU
5M tweets posted x day (58 QPS avg - 174 QPS peak)
200M tweets fetched x day (2.3k QPS avg - 7k QPS peak)
40:1 read to write

API Design

POST /tweet

body {

tweetId: string,

message: string,

userId: string,

createdAt: date,

}

POST /like/:tweetId

body {

userId: string,

createdAt: date

}

response 202 OK

POST /follow/:userFollowedId

body {

userId: string,

followedAt: date

}

response 202 OK

GET /feed/:userId

response {

tweets: [],

cursor: string

}

High-Level Design

On the high level design we have a gateway that handles all the general gateway things. Then our routes split to a read and write service. The write service handles posting tweets and all interactions like following and liking. Then the write service pushes to the according message queue. At the interaction message queue consumer it batches updates directly to RDS. As for the tweets consumers it updates user feeds in the cache so when the read services gets feed it is immediate.

Detailed Component Design

We have the message queues which I would have them do an adaptive queue management which does FIFO to ensure fairness under normal conditions and then LIFO if under high load. Maybe the tweets message queue I would leave as FIFO in order to ensure tweets get processed sequentially. The tradeoff I made with the message queue is accepting eventual consistency vs realtime but I think it's a good trade off to make here to be able to scale to the billions of users.

I chose a relational database as we'd have a lot JOINs since it's a social network and many fields interconnect to one another. When we need to scale RDS later we have spin up a read replicas as well as sharding by userId when it reaches that scale