System requirements


Functional:

  • Post a tweet
  • View tweets
  • Follow other users tweet
  • Put tweets in favorites


Non-Functional:

  • Scalable: the system should be able to handle large traffic, especially during peaks of traffic (for instance when a famous user post a tweet)
  • Latency must remain acceptable (requests should be answered in < 1s)



Capacity estimation

  • 500M daily active users
  • Users post between 1 to 10 tweets each day (~2.5B tweet posted daily)
  • Users views on average 20 tweets per day (10B tweets viewed daily)
  • 12.5B requests each day


API design

GET /tweet/{id}

POST /tweet

POST /tweet/{id}/favorite


GET /feed


GET /users/{id}

GET /users/{id}/tweets

POST /users/{id}/follow

POST /users/{id}/unfollow



Database design

Tweet

  • [PK] ID INT
  • [FK] UserID INT
  • TextContent VARCHAR
  • CreatedAt DATETIME
  • UpdatedAt DATETIME


Users

  • [PK] ID INT
  • Login VARCHAR
  • Email VARCHAR
  • Password VARCHAR


Feed

  • [FK] UserId INT
  • [FK] TweetID INT


Followers

  • [FK] UserID INT
  • [FK] FollowerUserID INT


We should use a partitioning strategy to avoid hot keys (famous users can make partitions unbalanced)

Adding a salt to the userid can help prevent this


High-level design





Request flows






Detailed component design



Trade offs/Tech choices

  • We can use eventual consistency to improve scaling
  • Tracking followers can be done using a graph database so we can have a better understanding of relationships between users even if they are not directly connected





Failure scenarios/bottlenecks

  • A famous users (lot of followers) posting a tweet can be a bottleneck because there will be numerous feeds to update. In our design the feeds are updated asynchronously to handle this case





Future improvements