Design Twitter - System Design

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

post a tweet
tweet may contain media like photos and videos
view others' tweets
modify/delete a tweet
receive others' updates by newsfeed
favorite/like a specific tweet
follow/unfollow a user

Non-Functional:

List non-functional requirements for the system...

Let's assume we have 1 million daily active users. Each user posts 1 tweet and views 100 tweets, on average every day.

So read TPS: 1M * 100 / 3600 / 24 = 1200. write TPS = 12.

This will be a read-heavy system.

Capacity estimation

Estimate the scale of the system you are going to design...

For storage capacity, let's assume each tweet's text is 140 characters at maximum and 100 bytes on average (including multi-language support). In addition, metadata should also be considered, let's say 32 bytes. Therefore, every day we need 1M * (100 + 32) = 132 megabytes. Also, there is one tweet with media every 10 tweets on average and media is 10M on average. So there should be 100k * 10M = 1T disk storage. The majority of the capacity will be spent on media, so every day the system will consume 1T storage.

API design

Define what APIs are expected from the system...

POST /api/tweet

parameters:

api_token: this is used for authorization
tweet_text
media_urls: media are pre-uploaded and referenced by this api via urls
hashtags

returns:

error code: 200 as succeed. Other codes should be accompanied by an error message

DELETE /api/tweet

parameters:

api_token: this is used for authorization
tweet_id: primary key to locate the target tweet

returns: error code/success

GET /api/tweet

parameters:

api_token
tweet_id

returns: a json including the tweet or error code

PATCH /api/tweet

parameters:

api_token
tweet_id
tweet_text
media_urls
hashtags

returns: success or error code

POST /api/media/

parameters:

api_token
media: binary file

returns whether:

success + url
error code + error message

GET /api/feed

parameters:

api_token
max_id: for pagination
min_id: for refreshing
page_size

return a json including a list of tweets as newsfeed.

POST /api/tweet/like

parameters:

api_token
tweet_id

returns: success or error code

POST /api/user/follow

parameters:

api_token
user_id

returns: success or error code

DELETE /api/user/unfollow

parameters:

api_token
user_id

returns: success or error code

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

There should be mainly three parts. One is user-specific data. This can be stored in a relational database as there might be some join operations/subqueries performed and relational DBs have a better handling on indexes and joins. Plus user data are not big data compared to tweets data. Tables include User, Account, Relationship

Tweets data and media metadata, on the other hand, should be stored on non-relational databases like Cassandra and DynamoDB, as the data size is extremely large and NoSQL is distributed naturally to scale the storage. Tables include Tweet, Photo, Video, UserLike, NewsFeed

Media data can be stored on object storage like Amazon S3 for robustness and the access can later be sped up via CDN. File system is also a good option but I do not prefer that as we need to maintain the servers by ourselves but S3 is serverless and autoscalable.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Please see the high level diagram

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Please see the sequence diagram

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

For newsfeed service, there are two types of models: push model and pull model.

Push model pros:

speeds up the newsfeed query process so that users can see it immediately

Push model cons:

consumes extra space, especially for a celebrity's tweet
takes time to generate
involves latency, not as timely as a pull model

Pull model pros:

do not need extra space
does not consume time to generate beforehand
reflects more timely update

Pull model cons:

takes time to query and aggregate

Push model is suitable for non-celebrity users while pull model is fit for celebrities. We can use a hybrid approach to apply push model on normal users and pull model on celebrities. Also for querying and aggregation, we can add a cache layer on top of tweet database and store celebrity tweets there to speed up the query process. We could use LRU as eviction strategy and cache through to get a better handle on cache misses.

Also if we choose push model, it takes time to pre-generate newsfeed information and store it into newsfeed table. We can use a message queue to decouple producers (assign the generation tasks) and consumers (generate feeds) in newsfeed service. This also holds for media uploading while creating a post. We can send a signal to frontend to incidate that uploading is completed (use spinning circle to tell that it is uploading and stop it as success after receiving the signal).

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Please see the discussion push/pull models.

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Databases are failed/bottlenecks.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Data partitioning:

based on hash(tweet_id): prone to scaling (add/remove hosts)

consistent hashing: more robust to scaling, also includes replication naturally.