Design Twitter - System Design

System requirements

Functional:

User should be able to

Compose a tweet and post it
Follow another user
View tweets of users you follow in your home feed
View feed of suggested content from accounts that is recommended based on popularity (not necessarily accounts you follow)
Favorite other users' tweets

Non-Functional:

Scalability: this system should be highly scalable. We want to be able to support many simultaneous users, potentially in different parts of the world
Response time: response time should be as low as possible for reads, we can tolerate longer response time for writes
Consistency: it is ok for there to be eventual consistency in this system. Users do not need to have newest updates instantaneously
Security: we need to ensure that users are authenticated in order to post, make changes to their account, and access the content from the users they follow

Capacity estimation

Estimate the scale of the system you are going to design...

User Base:

Let's assume daily active users (DAU) is 500 million.

Traffic:

We can calculate the traffic based on the number.

Tweeting: each user tweets about 2 times per day, so 1 billion tweets per day total
Home feed: each user loads their home feed about 10 times per day, 5 billion home feed loads per day total
Favorite: each user favorites about 1 tweet per day, so 500 million favorites per day total
Following: each user follows about 200 accounts, so 100 billion follow relationships total

Queries Per Second:

write: 500m*2/3600/24= 15k/qps
read: 500m*10/3600/24= 75k/qps
Favorites: 500m*1/3600/24 = 7.5k/qps

Data size :

Tweet: 1b tweets, each with 140 chars. considering encoding , let's assume 300 bytes. So total data is 280GB per day. It would be 100TB per year.

API design

Tweeting

POST: user ID, content of tweet

Home Feed

GET: user ID, page #

Following/Unfollowing

POST: user ID, followed user ID

Favoriting/Unfavoriting

POST: user ID, tweet ID

Database design

RDMS database with the following tables

Users table

user ID (UUID)
name (string)
email (string)
created at (timestamp)
updated at
etc.

Tweets table

tweet ID (UUID)
user ID (UUID)
tweet content (string)
created at (timestamp)

User Favorites table - many to many

user ID (UUID)
tweet ID (UUID)
created at (timestamp)
deleted at (timestamp)

User Followers table - many to many

user ID (UUID)
followed user ID (UUID)
created at (timestamp)
deleted at (timestamp)

High-level design

You should identify enough components that are needed to solve the actual problem from end to end.

Rate limiter

Protect again DOS attacks and ensure fair usage

Load balancer

use constant hashing to distribute load across servers

CDN

Serve cached data to localized areas to minimize response time

Services

Services to support major functions: user service (user management and following), tweet service (tweeting and favoriting), feed service (loading user feed)

Cache

application level caching

Database

use a relational database that is optimized to scale horizontally: amazon aurora, cockroach DB, etc.

Request flows

Explain how the request flows from end to end in your high level design.

The client will send a request that will initially be handled by the rate limiter and load balancer.
It will then be directed to the server and the CDN will deliver a response if possible
If CDN miss, then request will be routed to the correct service
The service interacts directly with the cache and the database to generate the response needed

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Cache

Use Redis cache
Will use cache aside update strategy
Use TTL invalidation strategy
Use LRU eviction strategy

Services

User service: handles all requests to create/update user accounts and handle following/unfollowing
- Writes associated with creating and following users will be done asynchronously to reduce response time and load on the server
Tweet service: handles all requests to post and favorite tweets
- Writes associated with posting and favoriting tweets will be done asynchronously to reduce response time and load on the server
Feed service
- Will check the cache for feed data for a particular user
- Otherwise will generate the feed data itself
- For home feed: queries the most recent tweets from followed users, results are paginated
- For popular feed: queries the most popular recent tweets across twitter based on like count, results are paginated

Database

I will be using a combination of replication and sharding. There will be a master-slave replication pattern, with a number of slave instances to support the high volume of reads this system will handle.

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Database

We could have chosen a NoSQL database as they are known to scale well horizontally and handle large amounts of data
However, because it seems like the structure of Twitter's data is relatively stable and it seems like we will currently and increasingly need to represent out data in a relational way, I chose a RDBMS
Additionally there are strategies that can help us scale even if we use a RDBMS

Populating the Feed

There are 2 approaches to populating user's feeds: either pushing data to followers when a user posts or pulling all followed users data to display to a user when requested
The pull strategy requires us to do some heavier processing in our service to collect tweets from all the accounts a user follows, but it allows us to avoid the scenario of pushing messages to a large number of followers every time a user tweets.
The push model would also require us to save users' feeds (as the pushed messages need to go somewhere) and this would need additional storage
I have chosen to proceed with the pull strategy to avoid the requirement for additional storage and to avoid building a message queue system

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

There may be some users with a very high number of followers that generate very high traffic when they post, our system will need to be able to handle these bursts
If we do not manage resources effectively, we could get backed up with asynchronous job handling if we get many write requests at the same time

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

How to address issues outlined in the previous section:

We can adjust our caching strategy to favor "hot" data so that we avoid making the full trip for these requests

Additional features we could add in the next iteration of the system:

Improving the algorithm for what kind of content is surfaced on the home feed: based on previous engagement by the user with other content, etc.
Allowing users to make their profiles private or public to control who can see their tweets
Content moderation: flagging and/or removing inappropriate content
Send/receive notifications whenever a followed user tweets, likes, or does any significant interaction