Design Twitter - System Design

System requirements

Functional:

User should be able to

Compose a tweet and post it
Follow another user
View tweets of users you follow in your home feed
View feed of suggested content from accounts that is recommended based on popularity (not necessarily accounts you follow)
Favorite other users' tweets

Non-Functional:

Scalability: this system should be highly scalable. We want to be able to support many simultaneous users, potentially in different parts of the world
Response time: response time should be as low as possible for reads, we can tolerate longer response time for writes
Consistency: it is ok for there to be eventual consistency in this system. Users do not need to have newest updates instantaneously
Security: we need to ensure that users are authenticated in order to post, make changes to their account, and access the content from the users they follow

Capacity estimation

Estimate the scale of the system you are going to design...

User Base:

Let's assume daily active users (DAU) is 500 million.

Traffic:

We can calculate the traffic based on the number.

Tweeting: each user tweets about 2 times per day, so 1 billion tweets per day total
Home feed: each user loads their home feed about 10 times per day, 5 billion home feed loads per day total
Favorite: each user favorites about 1 tweet per day, so 500 million favorites per day total
Following: each user follows about 200 accounts, so 100 billion follow relationships total

Queries Per Second:

write: 500m*2/3600/24= 15k/qps
read: 500m*10/3600/24= 75k/qps
Favorites: 500m*1/3600/24 = 7.5k/qps

Data size :

Tweet: 1b tweets, each with 140 chars. considering encoding , let's assume 300 bytes. So total data is 280GB per day. It would be 100TB per year.

API design

Tweeting

POST: user ID, content of tweet

Home Feed

GET: user ID, page #

Following/Unfollowing

POST: user ID, followed user ID

Favoriting/Unfavoriting

POST: user ID, tweet ID

Database design

RDMS database with the following tables

Users table

user ID (UUID)
name (string)
email (string)
created at (timestamp)
updated at
etc.

Tweets table

tweet ID (UUID)
user ID (UUID)
tweet content (string)
created at (timestamp)

User Favorites table - many to many

user ID (UUID)
tweet ID (UUID)
created at (timestamp)
deleted at (timestamp)

User Followers table - many to many

user ID (UUID)
followed user ID (UUID)
created at (timestamp)
deleted at (timestamp)

High-level design

You should identify enough components that are needed to solve the actual problem from end to end.

Rate limiter

Protect again DOS attacks and ensure fair usage

Load balancer

use constant hashing to distribute load across servers

CDN

Serve cached data to localized areas to minimize response time

Services

Services to support major functions: user service (user management and following), tweet service (tweeting and favoriting), feed service (loading user feed)

Cache

application level caching

Database

use a relational database that is optimized to scale horizontally: amazon aurora, cockroach DB, etc.

Scaling

I will be using a combination of replication and sharding. There will be a master-slave replication pattern, with a number of slave instances to support the high volume of reads this system will handle.

Request flows

Explain how the request flows from end to end in your high level design.

The client will send a request that will initially be handled by the rate limiter and load balancer.
It will then be directed to the server and the CDN will deliver a response if possible
If CDN miss, then request will be routed to the correct service
The service interacts directly with the cache and the database to generate the response needed

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Services

writes done asynchronously?

Cache

write back?

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Database type
Push vs pull for populating user feed
micro service vs monolith

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

celebrities who have many followers

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

How to address issues outlined in the previous section:

Additional features we could add in the next iteration of the system:

Improving the algorithm for what kind of content is surfaced on the home feed: based on previous engagement by the user with other content, etc.
Allowing users to make their profiles private or public to control who can see their tweets
Content moderation: flagging and/or removing inappropriate content
Send/receive notifications whenever a followed user tweets, likes, or does any significant interaction