System requirements


Functional:

  1. Follow and unfollow users.
  2. Post tweets.
  3. Search tweets.
  4. The tweets could contain text, image and video.
  5. View tweets in user's timeline.
  6. Like a tweets.
  7. Comment a tweets.
  8. Search tweets.


Non-Functional:

  1. Availability
  2. Scalability
  3. Latency
  4. Reliability
  5. Consistency



Capacity estimation

Assume twitter have 1B Daily Active Users.

Read QPS is 1B / 100k = 10k. Peak read QPS is 2 * 10k = 20k.

Assume 1% users write 10 tweets per day.

Write QPS is 1B / 100k * 1% * 10 = 1k. Peak write QPS is 2 * 1k = 2k.


Data storage estimation:

Assume 10% tweets contains image or videos. Average Tweets storage usage is 10k. The total daily storage usage is:

1B * 1% * 10 * 10k = 1TB. Each storage will need two more replica. So in total daily storage usage is 1TB * 3 = 3TB.




API design

GET getNewsFeed(

authToken,

userId,

lastSeenKey (for pagination)

count

) => Tweet[] / Error


GET getComments(

authToken,

userId,

parentId, (Could be either a tweet or comment)

lastSeenKey (pagination)

) => Comments / Error


POST postTweet(

authToken,

userId,

content: Content

) => Succeed / Error


POST postComments(

authToken,

userId,

parentId (Could be either a tweet or comment)

) => Succeed / Error


POST likeTweet(

authToken,

userId,

tweetId

) => Succeed / Error


GET searchTweet(

authToken,

userId,

searchText,

lastSeenKey (for pagination)

)


PUT editTweet

POST followUser

POST unfollowUser


For error handling, the api could send back different http error code with detailed error message with different kind errors. Some example http status error codes are:

400 Bad Request

401 Unauthorized

403 Forbidden

404 Not Found

503 Service Unavailable


For authentication, each api will be attached with authentication token. The token could be either cookies, auth 2 token or JWT token to validate user's identity.


Here are the objects in the api above:

Tweet {

tweetId,

creators,

createdAt,

likeCount,

content: Content

topComments: Comment[]

}


Content {

contentId,

tweetId,

text,

media: Media[]

}


Media {

mediaType,

mediaSolution,

mediaURL

}




Database design



High-level design

I will use sql database to store user, tweets and comments information. Because these data are relational data in nature. It is easier to combine and queries the data.


I will use graph database to store the follow and unfollow relationship between users.


I will use no-sql database to store media meta data.


I will use blob storage to store the actual image or video of the tweets.


SQL database Schema:

User Table: userId, userInfo...

Tweet Table: tweetId, userId, content, mediaId, likeCount, commentIds, createdAt, updatedAt

Comment Table: commentId, parentId, userId, content, createdAt, updatedAt


Graph database Schema:





Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...






Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...






Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...






Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.






Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?