Design Twitter - System Design

System requirements

Functional:

User can send tweet up to 140 characters (string of 150 byte)
User can follow other user
User can like other users' tweets
User's home feed will show an aggregation of all tweets from the users a user is following
This home feed will show top K popular tweets, based on the number of likes a tweet received, and the number of followers that tweet's author has
Presented in reversed chronological order in general

Non-Functional:

Scalability. 500 DAU
Availability: low latency. User has to tweets quickly. When user opens the home feed, the first 10 tweets should show up within 500 ms.
Can sacrifice consistency for Availability. It does not need strong consistency like banking transactions. Eventual consistency is okey. If user send a tweet, other user/follower in the same geographic region can see it within 1 second, but other user from other geographic regions of the world can see it after 30 seconds. This is acceptable.
Security, content moderation, anti abuse protection.

Capacity estimation

500 M DAU

Each user, send 2 tweets per day on average: 1B tweets per day.

Each tweets has 140 bytes, with meta data, so 500 bytes.

Storage: 1 B * 500 b -> 500 GB per day. (storage cost for 2 years: 500GB* 365 * 2-> 400 TB)

Database storage required: 500 TB

It better to use NoSQL database, the typical capacity of Relational database is around 100 TB.

Document based DB: MangoDB or DynamoDB

Each user, view 100 tweets per day.

Network IO bandwidth:

Ingress Traffic: 500 GB / per day (100000 seconds) -> 5 * 10 ^5 (10^5) -> 5 MB/s

Egress Traffic: 250 MB /s

QPS: 500 M / day (10^5 seconds) -> 500 * 10 ^ 6 / (10^5) -> 5000 QPS on average

If the latency of 1 API call to pull tweets is 500 ms per core: need 10000 cores

10000 core / 8 core per instance -> 2000 machine instances

Data model:

Tweet (document NoSQL database):

tweet_id: primary key
created_by: user ID
posted_time
content: string of 140 chars
media link: link to a picture or video content (which can be stored at S3)
number_of_likes
hashtags: list of hashtag strings user in the tweet
users mentioned: list of users mentioned in the tweet

User:

user_id: primary key
email
name
nickname
Date of birth
gender

Bottlenecks:

number_of_likes. If a famous person posts something, and millions of user click "like" within a few minutes, it would overwhelm the database server
One approach to overcome this is to break like counter into multiple (let's say 100) sub-counters, and make different database nodes responsible for each sub-counter
number of followers. If a famous person with millions of followers post something, the tweet should show up in millions's people's home feed in short period of time. Better to message queue to achieve this.

It is worth note that: millions of user might be viewing same content concurrently.

API design

Define what APIs are expected from the system...

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?