System requirements
Functional:
List functional requirements for the system (Ask the chat bot for hints if stuck.)...
- Compose tweets
- Share tweets
- Track updates of other users
- Like system
- Add media to Twitter
Non-Functional:
List non-functional requirements for the system...
- Availability: we want it to be multi-region redundant, 99.9% availability
- Scalability: let's aim for a highly scalable platform that can support millions of active users and handle a high volume of tweets and interactions daily.
- Monitoring: Metrics and log monitoring
Capacity estimation
Estimate the scale of the system you are going to design...
- Active Users: 100 million
- Daily Tweets: 500 million
- Daily Tweet Impressions: 5 billion
- Daily Likes: 1 billion
- Tweet size: 1 KB
- Profile size: 20 KB
- Media size: 2 MB
- User profile size state: 100.000.000 * 20 KB = 20.000.000.000 KB = 20.000.000 MB = 20.000 GB = 20 TB
- Tweet size per day: 500.000.000 KB = 500.000 MB = 0.5 GB tweet size a day
- Media size per day: 200.000.000 MB = 200.000 GB = 200 TB media size a day
- SLA error budget: 8.76 hours per year
API design
Define what APIs are expected from the system...
GET /api/v1/tweet/:id
200: {
tweetId: bigint,
userId: bigint
content: string
media: string
impressions: int
likes: int
date: date
}
4xx: Error
5xx: Error
POST /api/v1/tweet {
content: string
media: string
}
201: {
tweetId: bigint
}
4xx: Error
5xx: Error
POST /api/v1/like {
tweetId: bigint
}
201: {
tweetId: bigint
}
4xx: Error
5xx: Error
Database design
Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...
classDiagram
Tweet --> User
class User{
userId: bigint
username: string
bio: string
date: date
}
class Tweet{
tweetId: bigint
userId: bigint
content: string
media: string
impressions: int
likes: int
date: date
}
High-level design
You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...
flowchart TB
CDN -- Retrieves Static Files --> Client
Client -- Sends Request --> LB(Load Balancer)
LB -- Sends Request --> UserService
LB -- Sends Request --> TweetService
UserService <-- Read/Writes --> Graph_DB(User Database Graph)
TweetService <-- Read/Writes --> MySQL_DB(Tweet Database Relational)
TweetService -- Writes --> Blob_Storage(Media Storage Bucket)
Blob_Storage(Media Storage Bucket) -- Read --> CDN
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
The user will receive the client from a CDN since the frontend consists of static files. With the client, the user can make requests to our backend via the Load Balancer depending on which API it is requesting. If the user is uploading media to our service, we will store it in a blob storage like S3. When the media is written to the storage bucket. If it is being requested by the client, the request goes through the CDN in order to make assets globally available as well as reduce bandwidth costs.
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
The CDN will be deployed in many locations globally. Whenever media is being requested from our backend, the CDN will cache it in the given edge location. Subsequent reads will be faster because the file is stored on the CDN which is closer to the user.
The UserService and TweetService are stateless service for which we can use a containerization tool like Docker, and a container orchestrator like Kubernetes to handle the scaling in a fine-grained manner depending on the amount of traffic we receive on a given day. This will both reduce cost as well as allow us to scale up at peak moments (for example, when a tweet gets viral)
Lastly, we use a graph database for users to easily make relations between users that follow each other. For tweets, we use a relational database because relational database makes it easy to handle a lot of read-actions. We can horizontally scale this by introducing partioning for with the partition key could be the tweetId. With a master-slave architecture, we can also add multiple read databases to horizontally scale reads.
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
I'm using CDN to improve speed at the cost of the system becoming more expensive. However, it will result in less traffic to my backend servers.
I'm using Kubernetes to easily horizontally scale the stateless webservers. However, this will introduce complexity to the system. It is another layer of abstraction that we would have to keep an eye on.
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?