Design Twitter - System Design

System requirements

Functional:

Core functionality

User should be able to login using handle/email and password
User should be able to login using OAuth
User should be able to follow other users
User should be able to see the tweets of user's they follow
User should be able to favorite the tweets of other users
User should be able to retweet the tweets of other users
User should not be able to update a tweet
User should be able to create a tweet (text of 140 characters + media optionally add media) in a secure way, where only they can create a tweet for their account.
- User should be able to delete a tweet
User should get notifications when new tweets are created.
User should get notifications when they are followed
User should be able to comment on tweet

Non-Core functionality

User should be able to block users
User should be able to update their profile information
- Handle, Name, Profile Image,

Non-Functional:

Scaling and Performance

System should be able to handle peak loads of hundreds of thousands of requests per second
Response time for API requests should be under 200ms
System should avoid bottlenecks in regards to creating and reading the user feed as this would be the main experience
System should use a scalable database design, that can allow for fast fetching of data when needed.
System should use cache to avoid hitting the database for reads
System should allow for asynchronous population of tweets to avoid bottlenecks on writes
System should consider fault tolerance, replication and disaster recovery.

Data Storage

Data storage should be scalable with replication and sharding.
Due to the nature of the relationships between user's and tweets, a relational database like Postgres or MySQL will allow for better querying.
Applications should use separating Read and Write nodes to avoid bottlenecks.
System should save media to an Object Store
Sharding for Tweets table should be based on the user_id as the key
Sharding for Followers table should be based on the follower_id as they key

Real-Time Features

When new tweets are created, near-real-time ability to fetch those tweets from the user's feed will be needed. Using Websockets of WebRTC to notify the user of new tweets.

Security Considerations

System should be secure and use authentication to verify that tweets are being created by the correct user before they are processed, using auth tokens or JWT.

Cache

System should use a CDN to deliver media quickly in the user's regions
System should use cache like Redis or Memcache to store Application Memory Data with an LRU algorithm to maximize storage capacity
- Sharding in the same way as the database
  - Tweets by user_id (key)
  - Followers by follower_id (key)

Monitoring & Alerting

Alerts on non-200 requests (40x, 50x errors)
Alerts on Cache capacity
Dashboard
- Monitor throughput vs errors
- Monitor CPU usage
- Monitor Memory Usage

Capacity estimation

Target User base of of 10million users and ADU of 2,000,000 to 3,000,000 ADU
- 20 avg. reads request / second (minimum)
- Peak time of requests, could be roughly 3x the ADU. (9m ADU)
15-25% of those users will post on a daily basis
- ~300,000 tweets per avg day created
- ~ 1m tweets per day at peak
- ~ 1,000 tweets created per second (average)
- An average tweet with 140 characters and meta data could equal to 1kb per tweet
- 0.3 gb per day / 110 giga bytes per year

API design

/login
/logout
/oauth-callback -> redirect

Tweet Service (CRUD)

/create
/delete/:id
/follow/:id
/:user_id/favorite/:tweet_id

Feed Service

/feed

Notification Service

/:user_id/notifications
/:user_id/notifications/mark_as_read/:notification_id

User Service

/profile/:user_id/update

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

User

user_id PK
email
password (encrypted)
created_at
updated_at
oauth_provider
profile_id FK

Profile

profile_id PK
user_id FK
description
handle
image_url
hero_image_url
created_at
updated_at

tweet_id PK
user_id FK
content
media_id
aggregated_favorites
aggregated_retweets
aggregated_comments
tweet_id FK
created_at
updated_at
deleted_at

Comments

comment_id PK
tweet_id FK
user_id FK
comment_id FK
content
aggregated_likes
aggregated_favorites
created_at
updated_at
deleted_at

Media

media_id PK
media_url
media_type
tweet_id FK
created_at
updated_at
deleted_at

Notifications

notification_id PK
user_id FK
follower_id
tweet_id
created_at
updated_at
read_at

Followers

follower_id PK FK references user(user_id)
following_id FK references user(user_id)

High-level design

Read

Client -> API Gateway
Client -> CDN (Media Store)
API Gateway -> ALB
ALB -> App Servers
App Servers -> (Cache Hit) -> Feed Cache (Redis w/ LRU)
App Servers -> (Cache Miss) -> MySQL (Read Only)

Create

Client -> API Gateway
API Gateway -> ALB
ALB -> App Servers
App Servers -> Pub/Sub
Pub/Sub -> Spark Job
Spark Job -> (Generate Feed Cache) -> Feed Cache (Redis w/ LRU)
Spark Job -> (Update Database) -> MySQL (Write)
Spark Job -> Notification Service
Notification Service -> WebSocket

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?