Design Twitter - System Design

System Requirements

Functional requirements:

The user can post and share tweets
The user can like/favorite tweets
The user can see home timeline
The user can see other user's timeline

Non-functional requirements

Availability: Each request should get a response without error, without the guarantee that the data is the most recent
Consistency: Eventual consistency is chosen
Partition tolerance: The system should still operate even if some message are dropped due to the network between nodes
Low-latency: The user can see their timeline within 500ms

Capacity Estimation

Assumption:

200 million DAU, each user post 3 tweets per day = 600 million tweet per day.

Each tweet with 140 bytes as content and 30bytes as metadata, and 20% of them contains photo 20KB, and 10% of contains 2Mb video.

Each user read 5 times hometimeline and 5 times other user's timeline, each timeline contains 20 tweets.

Data storage:

so the total size will be: 600m * (170bytes + 20kb * 30% + 2Mb * 10%) = 180TB per day

Bandwidth: 200 million * (5 + 5) * 20 * (140 bytes + 10 % * 2Mb + 20% * 20kb) / 86400 = 120 GB/s

API Design

createTweet(userToken, String tweetcontent) -> response status code
hometimeline(userToken, int pagesize, optional int pageOffset: indicating current page location) -> tweets list
user timeline(userToken, int userId, int pagesize, optional int pageOffset) -> tweets list
likeOrUnlikeTweet(userToken, int tweetId, boolean likeOrDislike) -> response status code

Database design

I'd choose MongoDB as our database, because:

The tweets data are 180TB per day, a lot of data.
The low latency is our requirement.
We have horizontal scalability needs.

Database design:

Tweet:

TweetID: Integer, primary key

content: Varchar(140)

Metadata: Varchar(30)

....

User:

userId: Integer, primary key

email: varchar(30)

isHotUser: Boolean

Follower:

followerUserId: Integer

FolloweeUserId: Integer

FollowingDate: Timestamp

High level design

Request Flow

Can see from the high level diagram

Core component design

To ensure latency, every time a user post a tweet, the fanout service will retrieve follower data from database and then update the followers' timeline in the cache, and every user wanna get timeline, they check the cache firstly, if not existed, then they will query the database.

And for checking other's timeline, we can combine the push and pull mode, for hot users, we use the push mode, which means we add its updates in cache to reduce the database load and improve latency, and for cold user, we only query the database when needed, as querying in database is less efficient than redis.

And to better avoid staleness for the data in cache, we can update the cache regularly, and user LFU as our cache eviction policy.

For database sharding, we have 3 options: Sharding by userId, sharding by tweetCreationDate, sharding by tweetId. Here we can use sharding by tweetId, as it can better avoid the hot user problem while the other 2 ways are easier when doing query and easier implemented.

And for the database, we will do a master-slave mode to ensure availability and eventual consistency, although it increases implementation complexity and cost more resources, and can avoid a single point of failure.