System requirements


Functional:

  1. users should be able to tweet
  2. users should be able to track other user's tweets/activity - follow.
  3. users should be able to favorite other's tweets
  4. users should be able to delete tweets
  5. tweet should be limited to 140 characters?


Non-Functional:

  1. Availability
  2. Eventual consistency
  3. Performant




Capacity estimation

  1. 500 million DAU
  2. 250 million ~ 50% of DAU post around 2 new tweets
  3. 500 million tweets per day * 140 = 70 billion characters *200 bytes = 14 TB for tweets
  4. Metadata + replicas (3 times) = 50 TB storage






API design

  1. PUT/Delete/List tweets
    1. put needs the tweet, user id
    2. delete needs tweetid
    3. list needs user id, and or any filter we want to support
    4. returns 201 or 200 or 500
  2. Follow/unfollow User
    1. userid being followed - returns 200 or 404 or 500
    2. userid of user who is subscribing/unsubscribing - returns 200, 404 or 500
  3. Favorite/unfavorite Tweet
    1. tweetid, user favoriting - returns 200, 404,





Database design

Entities:

  1. tweets
    1. userid
    2. topicid
    3. tweetid
    4. tweet_text
    5. favorite_count (calculated column from favorites table)
  2. users
    1. userid
    2. user_name
    3. email
    4. num_followers(calculated column from subs)
  3. favorites
    1. tweetid
    2. userid
  4. user-subscribers
    1. topicid
    2. userid





High-level design

  1. Microservice architecture:
    1. User Management
    2. Tweet Management
    3. Subscribe service
    4. Favorite service
  2. Postgres DB because the data is relational.
  3. Load balancers with API gateway, Rate limiters
  4. Publish-subscribe pattern for user following feature
  5. Write-thru caching layer storing frequently read tweets, popular influencer's tweets. Can use Memcache because it can be a simple key-value pair.
  6. Read heavy so multiple read replicas of the database and 1 master Write database.
  7. Given this is a multi-tenant service, userid will be the partition key to separate data in the same table.
  8. UserId based sharding to distribute data among multiple nodes. Can use consistent hashing.





Request flows

  1. User creation will
    1. create a user profile
    2. create a topic that can be subscribed to
  2. Users can follow other users, which will subscribe them to the user's topic.
    1. This will create a mapping the subscribers table
    2. Increment count of followers in users tab
  3. user can post or delete a tweet which will
    1. create a tweet record
    2. push an event on the user topic for all subscribers
  4. User can favorite a tweet which will create a record in the favorites table and increment the fav count on the tweet.




Detailed component design

Subscribe/Follow service:

  1. Each User will get a topic and any activity by this user is posted as an event on this topic.
  2. Topic can be service bus, tcp relay, event grids, etc. Service bus and TCP relays can be pooled and shared for multiple users, new namespaces can be added for scale. Event grids have built-in scaling when configured for demand.
  3. Users who want to follow a specific user will call the Subscribe API and will be subscribed to the topic.
  4. We can cache tweets from user's with large number of followers.


Favorite Service:

  1. This service will be responsible for tracking favorites for a tweet and who favorited
  2. Tweets with large number of favorites can be cached.


The system overall will be highly available but with eventual consistency.

Caching and decoupled scaling will enable for low latency and high performance.

Multiple read replicas will enable users across the world to get their tweets served from a replica closest to them and provide for resiliency




Trade offs/Tech choices

  1. Consistency will be eventual, because we will first commit the tweet to the master DB and call it success and not wait for replication to the read replicas.
  2. Cache can sometimes be stale if the right TTL is not chosen or a good write-thru is not implemented.




Failure scenarios/bottlenecks

Topics can drop messages sometimes and cause cause some users to not get notifications of new tweet updates.





Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?