Design Twitter - System Design

System requirements

Assuming we already have a user log in system that handles authentication and authorization for us

Functional:

users should be able to make a post, with an attached photo
users should be able to re-post other people's tweets
users should be able to respond to other people's tweets
users should be able to follow and unfollow users

Non-Functional:

user's news feed should be displayed in a combined approach that shows recent tweets prioritized based off engagement
prioritize loading newsfeed quickly over optimizing the order of tweets perfectly, load 15 tweets at a time
posts users make should show up the next time their followers refresh their news feed
prioritizing speed and scalability over being incredibly accurate
rate limit users trying to post more than 10 posts per minute
system needs to be scalable as we expect to continue growing in number of daily users and posts
system needs to be available and fast

Capacity estimation

Estimate the scale of the system you are going to design...

expect 100k daily active users, with an average of 3 posts per user -> 300k posts per day
40% of tweets will have photos -> 120k tweets with photos
20% of tweets will have GIFs -> 60k tweets with GIFs
10% of tweets will have videos -> 30k tweets with videos
allocate all 280 bytes for characters for tweets -> 84 MB to store text
photo tweets -> 5MB per photo * 120k photos ->600k MB -> 600GB
GIF tweets -> 15MB per GIF * 60k -> 900k MB -> 900GB
video tweets -> 512MB per vid * 30k -> 15.36M MB -> ~15TB

to reduce load on GIFs we can rely on a bank of GIFs to reduce duplication of GIFs that need to be stored and instead store a link to the GIF used by tweet

we can also cache the most popular media and most popular tweets for faster retrieval

API design

Define what APIs are expected from the system...

Assuming headers will contain information about which user is making the request

POST /tweet/ - allows users to post a tweet, can take in a media attachment

returns 200 upon successful write of tweet to DB

returns error if write is not successful

automatically retries on error, using an exponential retry system before declaring total failure after 3 retries

POST /retweet/ - allows users to retweet another users post

returns 200 upon successfully writing link to users tweet in user's DB post entry

returns error if write is not successful, use exponential retry system for 3 retries before declaring total failure

GET /newsfeed/ - returns a users newsfeed, fetching tweets from the people they follow that have been made in the last 3 days and then running them through a prioritization algorithm to sort them based on engagement

returns 200 and newsfeed

returns 404 on error, using an exponential retry system

POST /response/ - allows users to respond to a tweet

DELETE /deleteTweet/ - allow user to delete tweet

return 200 if authorized and delete successful

return error if otherwise

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

User DB

userID - primary key

account creation date - datetime

tweets - list of tweetIDs associated with an account

follows - list of userIDs user follows

followers - list of userIDs following user

Tweet DB

tweetID - primarykey

text of tweet

mediaID - foreign key of media contained in tweet

author - foreign key of userID

creation timestamp

Media DB

media ID - primary key

media

tweetIDs - list of tweets associated with media

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design...

user accesses application via a web UI or an app, logging in and getting authenticated and being issued a token with a TTL to allow them to continue interacting with the application without having to refresh

all user requests will pass through an API gateway to allow for built in rate limiting and authorization services

all requests will hit a load balancer that will route the request to the right service and ensure an even level of load across all hosts

the service will pull the needed information from the database and return it, or return an error if the database read was unsuccessful

the service will write to the database if the request is a POST

Service break down:

User service will manage:

creation and deletion of users
follower management

Tweet service will manage:

creation and deletion of tweets
associating a tweet with a userID
fetching tweets for newsfeed and storing them in the cache

Media service will manage:

uploading of media
fetching media
association of media with tweets

all databases will have back up, read only copies that will be updated using a gossip protocol

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

user logs in -> gets authenticated through API gateway, receives some sort of auth token so we don't have to re-authenticate in the future -> request for newsfeed automatically generated, passes through load balancer and returns the top 20 or so tweets and then places the rest in a cache for faster access as the user scrolls

user posts a tweet -> tweet is written to tweet database

user retweets a tweet -> link to original tweet added to DB containing list of tweet ID's for a user

user follows another user -> user gets written to their followers list in the db

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

each service will use a load balancer to help scale up and down as traffic spikes
databases will have to be partitioned based on their primary key, and services will have to use a hash function to know which table to access (the specific hashing algorithm will depend on the service and database being accessed) to allow for horizontal DB scaling
we will use a redis cache for caching the newsfeed of active users, as well as the currently popular tweets and the media associated with them
we will use a least recently used method for clearing the cache when we run out of space in the cache

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

SQL databases provide faster reads but can be more of a pain to manage the migrations if there are schema updates that need to be made
using a gossip protocol means that newsfeeds may not be 100% accurate as a gossip protocol means eventual consistency instead of immediate consistency, however for something like social media it's not a big deal if a user misses a tweet that someone they follow just made

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

if there's a sudden traffic spike and there are lots of writes being made to the database this can cause a failure scenario as the database is overloaded and latency is increased as more requests have to wait for the database to lock and unlock before they can write, and requests can time out
if the API gateway is failing - either the gateway service we use is down, or it somehow gets disconnected from the services - then the whole application will be down

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

be able to block users
build in a system to delete users who have not been active in years to free up storage space
we can use a content distribution network to delivery the media components of tweets faster if we're running in to consistent latency issues
for an overwhelming amount of database writes, we could implement a queue system where the tweets that need to be written are added to the queue, the tweet POST request returns a 200, and the tweet will eventually be written to the database for access as the queue is processed
having a failover API gateway, or a back up authentication and authorization system that can be used in place of it should it fail will help to mitigate that scenario