System requirements
Functional:
- Follow and unfollow users.
- Post tweets.
- Search tweets.
- The tweets could contain text, image and video.
- View tweets in user's timeline.
- Like a tweets.
- Comment a tweets.
- Search tweets.
Non-Functional:
- Availability
- Scalability
- Latency
- Reliability
- Consistency
Capacity estimation
Assume twitter have 1B Daily Active Users.
Read QPS is 1B / 100k = 10k. Peak read QPS is 2 * 10k = 20k.
Assume 1% users write 10 tweets per day.
Write QPS is 1B / 100k * 1% * 10 = 1k. Peak write QPS is 2 * 1k = 2k.
Data storage estimation:
Assume 10% tweets contains image or videos. Average Tweets storage usage is 10k. The total daily storage usage is:
1B * 1% * 10 * 10k = 1TB. Each storage will need two more replica. So in total daily storage usage is 1TB * 3 = 3TB.
API design
GET getNewsFeed(
authToken,
userId,
lastSeenKey (for pagination)
count
) => Tweet[] / Error
GET getComments(
authToken,
userId,
parentId, (Could be either a tweet or comment)
lastSeenKey (pagination)
) => Comments / Error
POST postTweet(
authToken,
userId,
content: Content
) => Succeed / Error
POST postComments(
authToken,
userId,
parentId (Could be either a tweet or comment)
) => Succeed / Error
POST likeTweet(
authToken,
userId,
tweetId
) => Succeed / Error
GET searchTweet(
authToken,
userId,
searchText,
lastSeenKey (for pagination)
)
PUT editTweet
POST followUser
POST unfollowUser
For error handling, the api could send back different http error code with detailed error message with different kind errors. Some example http status error codes are:
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not Found
503 Service Unavailable
For authentication, each api will be attached with authentication token. The token could be either cookies, auth 2 token or JWT token to validate user's identity.
Here are the objects in the api above:
Tweet {
tweetId,
creators,
createdAt,
likeCount,
content: Content
topComments: Comment[]
}
Content {
contentId,
tweetId,
text,
media: Media[]
}
Media {
mediaType,
mediaSolution,
mediaURL
}
Database design
High-level design
I will use sql database to store user, tweets and comments information. Because these data are relational data in nature. It is easier to combine and queries the data.
I will use graph database to store the follow and unfollow relationship between users.
I will use no-sql database to store media metadata.
I will use blob storage to store the actual image or video of the tweets.
SQL database Schema:
User Table: userId, userInfo...
Tweet Table: tweetId, userId, content, mediaId, likeCount, commentIds, createdAt, updatedAt
Comment Table: commentId, parentId, userId, content, createdAt, updatedAt
Graph database Schema: The node will be the userId, the edges will be the relationship (following, follow) between users.
no-sql database stores media metadata: key is the id of the media, values contain the type of the media, whether it's image or video. The values also contain the actual urls of different resolution of the medias stored in blob storage.
Request flows
Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...
Detailed component design
Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...
Trade offs/Tech choices
Explain any trade offs you have made and why you made certain tech choices...
Failure scenarios/bottlenecks
Try to discuss as many failure scenarios/bottlenecks as possible.
Future improvements
What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?