Design Twitter - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

The following are the functional requirements

Users should be able to create an account, log in, and log out.
Capability to compose tweets.
User can follow the other user.
Track the updates for other users. User must have some personalised tweets.
Show the appreciation for specific tweets.
User should receive notification for likes, retweets and mentions.

Non-Functional:

List non-functional requirements for the system...

The following are the non functional requirements

System must be scalable.
System must be fault tolerance.
System must be available.

Capacity estimation

Estimate the scale of the system you are going to design...

I am assuming this will be a huge system. Here are my assumptions:

Total number of users : 1 billion
Daily active users : 100 million
Average tweets per day per user : 5
Average followers per user: 300
Average likes per tweet: 10

This is a read heavy system because people are reading tweets more than writing.

100 million * 5 = 500 million daily tweets per day from active users.

500 million * 10 = 50 billion likes per day from active users.

Total number of tweets overall = 1 billion * 5 = 5 billion tweets.

Total number of likes overall = 5 billion * 10 = 50 billion.

1 billion / (24 hrs * 3600 seconds) = 12K requests/second or 720K rpm

We will scale with these numbers.

Storage Estimation

let's assume one tweet will take 100 bytes of storage. That means our requirement will be

100 bytes * 1 billion = 100 billion bytes = ~ 100 GB/day

Let's say 10 percent is video or gif data then 10% of 1 billion.

100 million * 50 KB = ~5TB/day

API design

Define what APIs are expected from the system...

Api design will be as follows:

There will be one api for Read, it will be GET api. It will look like http://{url}/v1/getTweet/{tweetId} for the client
There will be one api for Write, it will be POST api. It will look like http://{url}/v1/makeTweet -d {''' content '''}
There will be one api that will update the tweet, it will be POST api. It will look like http://{url}/v1/updateTweet/{tweetId} -d {''' content '''}
http://{url}/v1/updateTweet/{tweetId}/like -> This will be POST api that will like the tweet.
http://{url}/v1/updateTweet/{tweetId}/follow ->This will be POST api that will follow.

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

We will have following actors in the DB with information

User -> userId, name, age, DOB, address.
Follower -> userId, followeeId, followerId.
Tweet -> tweetId, userId, tweetContent, likes, retweets.

Along with these there are one object storage we will use to store video, photos and gifs.

Diagram is attached for the above.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

In HLD there are components mentioned below:

Load balancer
CDN
Server
Cache
Relational Database
Pub/Sub queue
Worker
Feed Cache
Object Storage

We have client calling the system, client can be anything (web/app/desktop).

The call will first go to the load balancer basically it will redirect and distribute the traffic to the servers so that no one server will have more requests than the other.
The after that the request will go to app server where we will have different services like user service that will do the login and logout part also the authentication will be handled here.
Then the controller will redirect the request after understanding of api weather it is read, write or update.
If it is a read request the call will go the cache, it will check weather it has the required data or not.
If cache hits then it return the data
otherwise it goes to database and return the data after that it stores into the cache as well. This cache can be any thing like Redis.
Now we have video, audio, photos and gif data as well. So in that case will store that data into object storage. Because it is not feasible to store data into RDBMS
So we will have CDN also so that the content we are asking is delivered fast.
This CDN is a based on pull based mechanism that means object storage does not need to push the data into CDN. This link will be there to redirect to the object storage.
Now the question is how to update the feed?
So basically there are several mechanism to do this we can simply just have an api and fetch the tweets for the feed. But it very expensive because it is always making a query and in future the database may have to be shard then in that case the response will be combined of different shards. The it becomes more expensive.
So the better solution is to have a pub/sub queue with workers like apache storm that will asynchronously add tweets to the followers's in the feed cache when somebody is making a tweet.
But this we will not do every time because if a user will have millions of followers then this will not be feasible because it will be updating the feed cache for millions of users.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

The call will first go to the load balancer basically it will redirect and distribute the traffic to the servers so that no one server will have more requests than the other.
The after that the request will go to app server where we will have different services like user service that will do the login and logout part also the authentication will be handled here.
Then the controller will redirect the request after understanding of api weather it is read, write or update.
If it is a read request the call will go the cache, it will check weather it has the required data or not.
If cache hits then it return the data
otherwise it goes to database and return the data after that it stores into the cache as well. This cache can be any thing like Redis.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

I have used relational database to store my data like Mysql because I believe the data is more structured and there is a relation between tables hence it can be suitable to chose relational database.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

High Traffic Volume:

During trending topics or viral tweets, the platform may experience a surge in traffic leading to scalability issues.
Increasing number of users and tweets may result in database read/write bottlenecks.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Implement horizontal scaling with load balancers to distribute traffic evenly, caching strategies for frequently accessed data, and use of content delivery networks (CDNs) for serving media content.
Use database sharding for horizontal partitioning, indexing for faster retrieval, and NoSQL databases for efficient handling of unstructured data.