Codemia | Master System Design Interviews Through Active Practice

My Solution for Design Twitter with Score: 9/10

by infinity_zen649

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

Users should be able to compose and share tweets.
Users should be able to follow other users and track their updates.
Users should be able to favorite tweets to indicate their appreciation.

Non-Functional:

List non-functional requirements for the system...

1.scalability: system should be able to handle a large number of users, tweets and interactions.

2.high availability: ensure system function is available under the pressure of high traffic volume.

3. stability: ensure service is accessible without frequent issues or down time under the pressure of high concurrency.

Capacity estimation

Estimate the scale of the system you are going to design...

User Base:

Let's assume the user base is 500 million.

Traffic:

We can calculate the traffic based on the number.

Tweeting: Let's assume each user tweets once a day.we could have 500 millions tweets per day.
homeFeed: Let's assume each user view 10 page a day.
Following: Let's assume each user follows 100 other users on average, leading to 50000 million , as 50 billion follow relationships.
Favorites: If each user favorites 5 tweets per day, leading to 2500 million ,as 2.5 billion favorites per day.

QPS:

write: 500m*2/3600/24= 15k/qps
read: 500m*10/3600/24= 75k/qps
Favorites: 500m*5/3600/24 = 30k/qps

Data size :

Tweet: 500m tweets, each with 140 chars. considering encoding , let's assume 300 bytes. So total data is 140GB per day. It would be 50TB per year.
Let's assume the data of video and image is 100 times the tweets, it would be 10TB per day, and 4 PB per year.

It is clear that we need a distributed architecture.

API design

Define what APIs are expected from the system...

For tweeting, the api for user to post tweets.

public Result postTweets(UserInfo user, TweetInfo tweet);

public Result postTweets(Long userId, String tweetsText, String location,DateTime date);

For following,

public Result follow(Long userId, Long followedUserId);

public Result unFollow(Long userId, Long followedUserId);

For favorites:

public Result favorites(Long userId, Long tweetId);

public Result Unfavorites(Long userId, Long tweetId);

For feeds:

public Result renderFeeds(Long userId,String location, int pageNo);

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

The tables will include userInfo , tweets , follow relationship

I will use mysql to store userInfo ,tweets, and relationship.

And use Amazon S3 to store picture and videos.

UserInfo Table:

userId: primary key
userName
status
other profile(avatar, age ,etc.)

tweets Table:

tweetId: primary key
userId: index
content
postTime: index
modifyTime
status

Follower table:

userId: userId:
followerId: index
followedTime

The ER diagram is on the right;

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

There's some problems I need to take into account;

How to render the feeds of all the people which the user has followed?
If there are some celebrities ,which have numerous followers, how to handle the hotspot data?
How to update the feeds when user follow or unfollow someone?

Push & Pull

Usually, there're 2 common model to render the home feed. Pull and Push.

Push is when a user post a tweet, it is immediately pushed to the feeds of all their followers.

Features:

It can minimize the latency.
It needs more data storage, cause each user's timeline need to be stored and updated in real time.
If a celebrity is followed by numerous users, the system must handle a high volume of writes each time.

Pull is when a user refresh the homepage, then tweets are fetched in real time.

Features:

The system dynamically pull tweets from the people which a user followed. This may involves complex queries over large datasets to aggregate the tweets.
It doesn't need extra storage of DB for the tweets.
It's drawback is latency, when a user follows a lot of people.

I prefer to the Pull Model here, and then I'll dive deep at the stage of trade-off.

Client layer

First , I start from client ,such as website ,app, which send request to the server.
And the request should be distributed to a group of loadbalancers, which can distribute request evenly to the servers.
I use rate limiter to protect each back-end server from crashing down due to high traffic.
In order to decrease the latency time, I use CDN to store static files, such as images and videos. Also the CDN is distributed.

Server layer

There's are many server clusters receiving the requests, and process with different services. Restore datas into corresponding tables.
The server cluster is consist of a set of services, including tweet service, user Service, Home feed service.
Tweet service responsible for posting tweets.
User Service is for user registration and profile management.
Follow service is for following and unfollowing.
Home feed service is for rendering the feeds for users.

Data layer

I use redis to cache data and increase the response speed.
And redis is deployed in mater-slave architecture.
Mysql can be scaled horizontally to handle large-scale data, with master- salve architecture to ensure data consistency and availability.
I use Amazon S3 to store files , videos and images.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

First , the loadbalancer receive request from client.

After process with algorithm like round-robin or Consistent Hashing, the requests are distributed to one of the servers.

If the traffic exceed the threshold of rate limiter, the request will be blocked.

The server process the request and store data into Mysql and redis at the same time.

Also put image and video into CDN;

When the read request come in, server retrieves files from CDN ,and query data from redis, if no exists, query from Mysql ,then put into redis.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Loadbalancer

I will deploy multiple load balancers in a cluster allows them to work together, sharing the load dynamically. On failure of primary LB, the backup can take over.
I will place LB in different locations to reduce latency for users far from the server. It can directs traffic to the server closest to the user
I will use LB services provided by cloud providers Like AWS.These services provide flexibility and remove the need for manual intervention:
I will choose the appropriate algorithm: like round-robin, Least connections, Ip hash, ect.

CDN

I will use both pull and push CDN caching approaches to get the benefits of both.
I can use Edge Side Includes (ESI) to optimize dynamic content caching. By caching the unchanged part of the website, we don’t need to fetch the full web page each time when the content changes.
I will optimize caching rules and adjust TTL to achieve a higher Cache Hit Ratios.
I will scale the system by adding more server nodes to the tree-like structure of CDN. It can reduce the burden on the origin server.

Redis

I will use redis cluster to handle large-scale data. It can be scaled horizontally by adding more nodes and shards data cross multiple nodes.
I will use master-slave Replication to ensure high availability and data consistency.
I will use sentinel to monitors the cluster, detects failing nodes, and support automatically failover.

Mysql

I will use master-slave architecture to handle massive data and high volume traffic.
I will use mater-slave replication to ensure data consistency
I will use Horizontal Partitioning to spit the table to handle more data by distributing the load across multiple servers or databases.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

For database

I choose RDB like Mysql rather than NoSQL.

Although NoSQL provides schema flexibility , but it cannot support complex queries, Structured Data and transaction. Cause the business model of twitter does not change a lot , not like B2B business.

For cache

I choose Redis rather than memcache.

Cause redis support various data type and Horizontal Scaling. It is more suitable for large-scale system and high volume traffic.

Memcache is simple and efficient for basic key-value store, but does not have many advanced features.

Redis offers more robust solutions on scale and availability.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Pull model -> Hybrid model

If there's a user who had followed a lot of people, the combination of home feed could take a long time.

So we can adopt a hybrid model.

Adopt hybrid model, set a threshold of followed people. If the threshold is exceeded, I will use push model to push the new tweets to the user. This can reduce latency and improve user experience.

Read hotspot

If there's a celebrity, who is followed by numerous people. His tweet will become read hotspot.

First I will use redis to cache his tweets, use cache aside strategy to ensure the consistency of his tweets.
If the data is extremely hot, I will add hot zone in each redis server to store the hot data. It can distribute server pressure and avoid excessive calls to the same Redis server.
Then I will use local cache to distribute the load of Redis, to handle more traffic.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

To ensure the disaster recovery and availability of services, I would opt for a multi-region active-active strategy. This involves deploying service clusters and database clusters in multiple locations. Through automatic failover and load balancing, this strategy ensures that there are no single points of failure within any cluster.