System requirements


Functional:

  • Post a tweet
  • View tweets
  • Follow other users tweet
  • Put tweets in favorites


Non-Functional:

  • Scalable: the system should be able to handle large traffic, especially during peaks of traffic (for instance when a famous user post a tweet)
  • Latency must remain acceptable (requests should be answered in < 1s)
  • High availability, at leat 99.9%



Capacity estimation

  • 500M daily active users
  • Users post between 1 to 10 tweets each day (~2.5B tweet posted daily)
  • Users views on average 20 tweets per day (10B tweets viewed daily)
  • 12.5B requests each day


API design

GET /tweet/{id}

POST /tweet

POST /tweet/{id}/favorite


GET /feed


GET /users/{id}

GET /users/{id}/tweets

POST /users/{id}/follow

POST /users/{id}/unfollow



Database design

Tweet

  • [PK] ID INT
  • [FK] UserID INT
  • TextContent VARCHAR
  • CreatedAt DATETIME
  • UpdatedAt DATETIME


Users

  • [PK] ID INT
  • Login VARCHAR
  • Email VARCHAR
  • Password VARCHAR


Feed

  • [FK] UserId INT
  • [FK] TweetID INT


Followers

  • [FK] UserID INT
  • [FK] FollowerUserID INT


We should use a partitioning strategy to avoid hot keys (famous users can make partitions unbalanced)

Adding a salt to the userid can help prevent this


High-level design

Requests are handled by different services so we can scale them independently

  • Tweet service
  • Feed service
  • Users service


Feeds are generated asynchronously when tweets are posted. They are then cached as a single JSON


Frequent viewed tweets are also cached



Request flows






Detailed component design

Feed processing system:

  • Each newly posted tweet creates an item in the tweet queue
  • Feed processing nodes dequeue tweets
  • Followers of the tweet author are fetched from the followers database
  • Each follower feed is updated in the database and cached as a single gzipped JSON (with the tweets content for maximum performance)


Trade offs/Tech choices

  • We can use eventual consistency to improve scaling
  • Tracking followers can be done using a graph database so we can have a better understanding of relationships between users even if they are not directly connected


Failure scenarios/bottlenecks

  • A famous users (lot of followers) posting a tweet can be a bottleneck because there will be numerous feeds to update. In our design the feeds are updated asynchronously to handle this case



Future improvements

  • Enchance tweets with the possibility to include medias
  • Add Monitoring and analysis systems