Design Twitter - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

Compose tweets
Share tweets
Track updates of other users
Like system
Add media to Twitter

Non-Functional:

List non-functional requirements for the system...

Availability: we want it to be multi-region redundant, 99.9% availability
Scalability: let's aim for a highly scalable platform that can support millions of active users and handle a high volume of tweets and interactions daily.
Monitoring: Metrics and log monitoring

Capacity estimation

Estimate the scale of the system you are going to design...

Active Users: 100 million
Daily Tweets: 500 million
Daily Tweet Impressions: 5 billion
Daily Likes: 1 billion

Tweet size: 1 KB
Profile size: 20 KB
Media size: 2 MB

User profile size state: 100.000.000 * 20 KB = 20.000.000.000 KB = 20.000.000 MB = 20.000 GB = 20 TB
Tweet size per day: 500.000.000 KB = 500.000 MB = 0.5 GB tweet size a day
Media size per day: 200.000.000 MB = 200.000 GB = 200 TB media size a day

SLA error budget: 8.76 hours per year

API design

Define what APIs are expected from the system...

GET /api/v1/tweet/:id

200: {

tweetId: bigint,

userId: bigint

content: string

media: string

impressions: int

likes: int

date: date

}

4xx: Error

5xx: Error

POST /api/v1/tweet {

content: string

media: string

}

201: {

tweetId: bigint

}

4xx: Error

5xx: Error

POST /api/v1/like {

tweetId: bigint

}

201: {

tweetId: bigint

}

4xx: Error

5xx: Error

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

classDiagram

Tweet --> User

class User{

userId: bigint

username: string

bio: string

date: date

}

class Tweet{

tweetId: bigint

userId: bigint

content: string

media: string

impressions: int

likes: int

date: date

}

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

flowchart TB

CDN -- Retrieves Static Files --> Client

Client -- Sends Request --> LB(Load Balancer)

LB -- Sends Request --> UserService

LB -- Sends Request --> TweetService

UserService <-- Read/Writes --> Graph_DB(User Database Graph)

TweetService <-- Read/Writes --> MySQL_DB(Tweet Database Relational)

TweetService -- Writes --> Blob_Storage(Media Storage Bucket)

Blob_Storage(Media Storage Bucket) -- Read --> CDN

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

The user will receive the client from a CDN since the frontend consists of static files. With the client, the user can make requests to our backend via the Load Balancer depending on which API it is requesting. If the user is uploading media to our service, we will store it in a blob storage like S3. When the media is written to the storage bucket. If it is being requested by the client, the request goes through the CDN in order to make assets globally available as well as reduce bandwidth costs.

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

The CDN will be deployed in many locations globally. Whenever media is being requested from our backend, the CDN will cache it in the given edge location. Subsequent reads will be faster because the file is stored on the CDN which is closer to the user.

The UserService and TweetService are stateless service for which we can use a containerization tool like Docker, and a container orchestrator like Kubernetes to handle the scaling in a fine-grained manner depending on the amount of traffic we receive on a given day. This will both reduce cost as well as allow us to scale up at peak moments (for example, when a tweet gets viral)

Lastly, we use a graph database for users to easily make relations between users that follow each other. For tweets, we use a relational database because relational database makes it easy to handle a lot of read-actions. We can horizontally scale this by introducing partioning for with the partition key could be the tweetId. With a master-slave architecture, we can also add multiple read databases to horizontally scale reads.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

I'm using CDN to improve speed at the cost of the system becoming more expensive. However, it will result in less traffic to my backend servers.

I'm using Kubernetes to easily horizontally scale the stateless webservers. However, this will introduce complexity to the system. It is another layer of abstraction that we would have to keep an eye on.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?