System Requirements

Functional requirements:

  1. The user can post and share tweets
  2. The user can like/favorite tweets
  3. The user can see home timeline
  4. The user can see other user's timeline


Non-functional requirements


  1. Availability: Each request should get a response without error, without the guarantee that the data is the most recent
  2. Consistency: Eventual consistency is chosen
  3. Partition tolerance: The system should still operate even if some message are dropped due to the network between nodes
  4. Low-latency: The user can see their timeline within 500ms








Capacity Estimation

Assumption:


200 million DAU, each user post 3 tweets per day = 600 million tweet per day.

Each tweet with 140 bytes as content and 30bytes as metadata, and 20% of them contains photo 20KB, and 10% of contains 2Mb video.

Each user read 5 times hometimeline and 5 times other user's timeline, each timeline contains 20 tweets.


Data storage:

so the total size will be: 600m * (170bytes + 20kb * 30% + 2Mb * 10%) = 180TB per day


Bandwidth: 200 million * (5 + 5) * 20 * (140 bytes + 10 % * 2Mb + 20% * 20kb) / 86400 = 120 GB/s



API Design


  1. createTweet(userToken, String tweetcontent) -> response status code
  2. hometimeline(userToken, int pagesize, optional int pageOffset: indicating current page location) -> tweets list
  3. user timeline(userToken, int userId, int pagesize, optional int pageOffset) -> tweets list
  4. likeOrUnlikeTweet(userToken, int tweetId, boolean likeOrDislike) -> response status code






Database design


I'd choose MongoDB as our database, because:

  1. The tweets data are 180TB per day, a lot of data.
  2. The low latency is our requirement.
  3. We have horizontal scalability needs.

Database design:


Tweet:


TweetID: Integer, primary key

content: Varchar(140)

Metadata: Varchar(30)

....


User:

userId: Integer, primary key

email: varchar(30)

isHotUser: Boolean


Follower:

followerUserId: Integer

FolloweeUserId: Integer

FollowingDate: Timestamp







High level design







Request Flow

Can see from the high level diagram