Design Twitter - System Design

System requirements

Functional:

Follow and unfollow users.
Post tweets.
Search tweets.
The tweets could contain text, image and video.
View tweets in user's timeline.
Like a tweets.
Comment a tweets.
Search tweets.

Non-Functional:

Availability
Scalability
Latency
Reliability
Consistency

Capacity estimation

Assume twitter have 1B Daily Active Users.

Read QPS is 1B / 100k = 10k. Peak read QPS is 2 * 10k = 20k.

Assume 1% users write 10 tweets per day.

Write QPS is 1B / 100k * 1% * 10 = 1k. Peak write QPS is 2 * 1k = 2k.

Data storage estimation:

Assume 10% tweets contains image or videos. Average Tweets storage usage is 10k. The total daily storage usage is:

1B * 1% * 10 * 10k = 1TB. Each storage will need two more replica. So in total daily storage usage is 1TB * 3 = 3TB.

API design

GET getNewsFeed(

authToken,

userId,

lastSeenKey (for pagination)

count

) => Tweet[] / Error

GET getComments(

authToken,

userId,

parentId, (Could be either a tweet or comment)

lastSeenKey (pagination)

) => Comments / Error

POST postTweet(

authToken,

userId,

content: Content

) => Succeed / Error

POST postComments(

authToken,

userId,

parentId (Could be either a tweet or comment)

) => Succeed / Error

POST likeTweet(

authToken,

userId,

tweetId

) => Succeed / Error

GET searchTweet(

authToken,

userId,

searchText,

lastSeenKey (for pagination)

)

PUT editTweet

POST followUser

POST unfollowUser

For error handling, the api could send back different http error code with detailed error message with different kind errors. Some example http status error codes are:

400 Bad Request

401 Unauthorized

403 Forbidden

404 Not Found

503 Service Unavailable

For authentication, each api will be attached with authentication token. The token could be either cookies, auth 2 token or JWT token to validate user's identity.

Here are the objects in the api above:

Tweet {

tweetId,

creators,

createdAt,

likeCount,

content: Content

topComments: Comment[]

}

Content {

contentId,

tweetId,

text,

media: Media[]

}

Media {

mediaType,

mediaSolution,

mediaURL

}

Database design

High-level design

I will use sql database to store user, tweets and comments information. Because these data are relational data in nature. It is easier to combine and queries the data.

I will use graph database to store the follow and unfollow relationship between users.

I will use no-sql database to store media metadata.

I will use blob storage to store the actual image or video of the tweets.

SQL database Schema:

User Table: userId, userInfo...

Tweet Table: tweetId, userId, content, mediaId, likeCount, commentIds, createdAt, updatedAt

Comment Table: commentId, parentId, userId, content, createdAt, updatedAt

Graph database Schema: The node will be the userId, the edges will be the relationship (following, follow) between users.

no-sql database stores media metadata: key is the id of the media, values contain the type of the media, whether it's image or video. The values also contain the actual urls of different resolution of the medias stored in blob storage.

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?