Design Twitter - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

A user can post a new tweet, and the tweet will display on the homepage and push to the feed the people who are following him.
A user can like a post.
The user can see the latest posts from all his friends and the posts are sorted by timestamp

Non-Functional:

List non-functional requirements for the system...

availability: The system is accessible at any give time. The services are available. It rarely has down time for the given year.
Scalability: The system can process a large amount of requests without reduce the performance
High performance: The application is fast even when it is receiving a large amount of request and having a large user base

Capacity estimation

Estimate the scale of the system you are going to design...

1000 users per mins
each user post a tweet with 1000 characters: 1000 * 4bytes = 4000 bytes
1000 * 4000 = 4 millions bytes / mins
1day * 24hours * 60mins = 1440 mins
4millions bytes * 1440 mins = 5760 bytes / day
5760 bytes* 30days = 172800 millions bytes / month = 172.8 G / month
2703G/yr

API design

Define what APIs are expected from the system...

Endpoints:

get: api/users_name
return the homepage of this user
post: api/user_name
body: {user_id: molly123, content: 'hello world', timestamp: '04-23-2024 15:30:00'}
return: 20 ok for successful

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

user_table:
user_id
user_name
user_email
created_time
content_table
post_id
user_id
post_content
timestamp
follower_table
user_id
follower_id
following_time

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

See the high level diagram

B[client] --> A{CDN}

B[client] -->E{load balancer} -->C{server}

C --> F{cache}

C --> D[master Database]

D --> R{replication 1}

D --> P{replication 2}

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

When user log in his twitter page, it will pull the content from the closest region through the cdn.
Once user is in this twitter page, he can review the latest tweets from this friends that he follows from the cache. If the cache does not contain all the necessaries, the rest of the posts will be full from the database, and then store in the cache, then display on the user's page.
If user wants to post a new tweet, the post will be stored in the database

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

load balancer 1

It's used to distribute requests to the servers so that the system can be highly scalable
The method of the distribution is used least loaded. It means that it will distribute the newest request to the server that has the lowest number of active connections at the current moment. This way can avoid overloaded for a single server, and improve the efficiency of the system

load balancer 2

It's used to avoid single point of failure of the master database. If the master fails, the load balancer 2 will assign the one of the replication as master until the original master recovers.

sql database

I decided to use master-slaver database architecture. It can backup data and improve performance. It can spread some read request to the replicas so that the master is not overloading.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

I used sql database instead of nosql database because it is easier to write because the table is well structured. But, when our userbases increase, the size of the database are increasing as well. It's complicated to join different tables to fetch data.
I used redis database instead of memcached in the cache layer to store the frequent accessing data, which can improve application's performance. Redis can support different database types, but mecached cannot.
adding a load balancer in fronted the master database, it can prevent single point of failure in database level. If the master fails, it will promote one of the slaver to master until that master recovers. However, it makes the application more complicated, and we need to add extra logic to handle this feature.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

When a user follows a lot of people, it might take some time to load it feed page. One way to prevent this issue is to limit the amount of people that we can follow.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?