System requirements
Functional:
- User can send tweet up to 140 characters (string of 150 byte)
- User can follow other user
- User can like other users' tweets
- User's home feed will show an aggregation of all tweets from the users a user is following
- This home feed will show top K popular tweets, based on the number of likes a tweet received, and the number of followers that tweet's author has
- Presented in reversed chronological order in general
Non-Functional:
- Scalability. 500 DAU
- Availability: low latency. User has to tweets quickly. When user opens the home feed, the first 10 tweets should show up within 500 ms.
- Can sacrifice consistency for Availability. It does not need strong consistency like banking transactions. Eventual consistency is okey. If user send a tweet, other user/follower in the same geographic region can see it within 1 second, but other user from other geographic regions of the world can see it after 30 seconds. This is acceptable.
- Security, content moderation, anti abuse protection.
Capacity estimation
500 M DAU
Each user, send 2 tweets per day on average: 1B tweets per day.
Each tweets has 140 bytes, with meta data, so 500 bytes.
Storage: 1 B * 500 b -> 500 GB per day. (storage cost for 2 years: 500GB* 365 * 2-> 400 TB)
Database storage required: 500 TB
It better to use NoSQL database, the typical capacity of Relational database is around 100 TB.
Document based DB: MangoDB or DynamoDB
Each user, view 100 tweets per day.
Network IO bandwidth:
Ingress Traffic: 500 GB / per day (100000 seconds) -> 5 * 10 ^5 (10^5) -> 5 MB/s
Egress Traffic: 250 MB /s
QPS: 500 M / day (10^5 seconds) -> 500 * 10 ^ 6 / (10^5) -> 5000 QPS on average
If the latency of 1 API call to pull tweets is 500 ms per core: need 10000 cores
10000 core / 8 core per instance -> 2000 machine instances
Data model:
Tweet (document NoSQL database):
- tweet_id: primary key
- created_by: user ID
- posted_time
- content: string of 140 chars
- media link: link to a picture or video content (which can be stored at S3)
- number_of_likes
- hashtags: list of hashtag strings user in the tweet
- users mentioned: list of users mentioned in the tweet
User:
- user_id: primary key
- name
- nickname
- Date of birth
- gender
Bottlenecks:
- number_of_likes. If a famous person posts something, and millions of user click "like" within a few minutes, it would overwhelm the database server
- One approach to overcome this is to break like counter into multiple (let's say 100) sub-counters, and make different database nodes responsible for each sub-counter
- number of followers. If a famous person with millions of followers post something, the tweet should show up in millions's people's home feed in short period of time. Better to message queue to achieve this.
It is worth note that: millions of user might be viewing same content concurrently.
API design
Define what APIs are expected from the system...
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?