Design Twitter - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

Users should be able to post tweets (140 chars)
In the tweet, the user can attach images and videos in addition to 140 chars of text
Users can share other tweets
Users can favorite other tweets
Users can view a timeline or feed of tweets to track updates of other users

Non-Functional:

List non-functional requirements for the system...

Scalable - see capacity estimations below but essentially large # of tweets, large data store, horizontal scaling, load balancer, read-heavy database/process - need this to be scalable
Highly available - no crash (at the risk of consistency) - read-heavy
Reliable - no single points of failure (page should load)
Low latency (low page load time) < 1s

Capacity estimation

Estimate the scale of the system you are going to design...

DAU: 30 million users
1/10 are posting 5 tweets a day: 3 million * 5 = 15 million tweets posted a day
1/5 contains video/images -> 3 million images/videos
Text -> 140 chars = 140 bytes * 15 million = 2100 MB or 2.1 GB data every day
30MB/video = 900MB per day
Traffic: peak periods (event, live tweeting) - 86400 seconds in one day
100 million users posting -> 1000 writes/second
200 million users reading -> 2000 reads/second

Summary:

Need scalable DB to store this much data
Since this is a read-heavy system, will need read replicas + caching to improve performance
Need object store for images/videos -> can leverage CDN for regional locations as a caching mechanism
To process read/writes need multiple application servers (horizontal scaling) and can leverage load balancer to distribute the load
Can additionally have a cache between app servers and DB

API design

Define what APIs are expected from the system...

postTweet
Arguments
Username:str username of user posting tweet
User location:[lat, long] location of user
User device: str device from which tweet was posted
Text: tweet text
Images: optional image
Videos: optional video
Logic
Validate text constraints (140 char) -> can be done on the frontend
Upload images + video to S3
Write data to database (user metadata, tweet info)
Generates a tweet ID
Response
REST API
201 successfully created
500 internal server error (something went wrong on the backend)
400-level user validation issue

shareTweet
Arguments
Tweet ID:str ID of the tweet to share
Username:str user who retweeted
Logic
New tweet created in database with reference to original tweet (need column for retweet or not)
Increment original tweet's retweet count
Update timeline generation mechanism (for new user) to followers feeds
Response
200 successfully retweeted
Contains info for OG user notification to be sent
favouriteTweet
Arguments:
Tweet ID: str ID of tweet to favourite
Username: str user who retweeted
Logic
Original tweet favourite count is incremented
Add list of usernames that have favourited
Update user's list of favourited tweets
Response
200 successfully favourited

viewFeed
arguments:
username: str user whose feed we need to get
logic
can have a cache that gets updated with latest 50 tweets for a given user
we simply query the cache and return tweets
every time a new tweet is posted that's relevant to this user, it gets queued for the feed generation
response:
200 successful
500 internal failure

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

Object store: S3

Stores images, video

Database: NoSQL (DynamoDB) because we need something highly scalable (horizontal) since we'll be storing millions of tweets each day

lose out on ACID (consistency) but that should be okay if we see a small delay before tweets render on the page or if we have stale tweets instead of latest on each reload

Users
Email
Username
User since date
Tagline
Tweets
Pointers to tweets in tweet DB (foreign key)
Favourited tweets
Pointers to tweets in tweet DB
Following
Users they follow
Followers
Users that follow them
Number of followers
Tweets
Tweet ID (primary key)
Text
Image/video pointers to object store
Date created (sort key)
Location of tweet
Posted from device
Retweet count
Favourited count
Whether it's a retweet or the original tweet
User who created tweet (foreign key)

Common queries:

Timeline: get 50 tweets from people the user follow

Indexing:

User -> get all tweets by a given user in date range X

Potential areas/issues:

Tweet gets deleted -> dangling reference in favourited tweets
User gets deleted -> do we keep all their tweets?
Update following/followers

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?