Requirements


Functional Requirements:


  • Allow users to tweet messages up to 140 characters.
  • Enable users to follow other users.
  • Allow users to like tweets from other users.
  • Display tweets from followed users in the home feed.
  • Show top K popular tweets in the home feed based on likes and followers.



Non-Functional Requirements:


  • Scalability - should be able to support millions of concurrent users
  • Latency - fast initial load, seamless infinite scroll experience
  • Availability - Fault tolerance for server failures
  • Consistency - Eventual consistency for tweets/likes/follows


API Design

Not mentioning authentication routes in the api, even though the application obviously needs one. Assuming this is not the focal point of this question.

Api endpoints will use an authentication strategy such as JWT, session tokens, or otherwise to authenticate and autherize, such that the endpoints are protected from unautherized access and usage.


POST /tweet/ body: {} response: {}

GET /tweet/ body: {} response: {} // assuming the K constant is a server side constant that may change but not by the consumer of the api

PUT /tweet/:tweet-id/like body: {liked: boolean} response: {}

POST /user/follow/:username

DELETE /user/follow/:username




High-Level Design

Light API gateway + load balancer

Database - Users table, Tweets table, Follow table, Like Table.

Redis cache - Popular tweets (based on likes, follow count of the user posting them), next X tweets for each users (key is the username)

Authentication Service - for login, signup and gatekeeping the other api routes

Tweet Service - Posting, liking, unliking and fetching tweets. Uses the Database for personalized tweets, the cache for popular tweets.

Users Service - Following and unfollowing

Message Queue - For quick ack's when performing potentialy compute heavy operations (posting a tweet means updating the DB table).


Tweet and Users service are stateless which make them very horizontally scalable using replicas, with the load balancer load balancing.





Detailed Component Design

Deep dive into 2-3 key components. Explain how they work, how they scale, discuss tradeoffs, capacity, and any relevant algorithms or data structures.


# Tweet Service


## Real Time Tweet Request Handling

Implement the tweet endpoint and manages the tweet table and like table.

Uses the Message Queue to quickly send an ack to the user when posting and liking, shortening the response time of potentially compute heavy process (insert, cache invalidation for feeds of the people who follow the user). Also adding some fault tolerance (if the server crashes after sending an ack the message persists and can be handled by another server instance)


## Offline Message Queue Handling

Consumes the Messages from the Message Queue and handles their jobs asynchronously. That means Database Updates, User Feed Cache Invalidation, etc.


## Cache Handling

Uses the cache in 2 main ways:

Singular key for general popular tweets, sorted by a linear combination of tweet likes and the posting user follow count, that is refreshed every minute.

Second key for personalized user feed, that gets populated upon user login for quick viewing of followed account's tweets.

Should support sharding for



# Edge Cases Handling

  1. Heavy System load - The cache adds some read tolerance, message queue gives a buffer for Database writes, Stateless services allows for dynamic autoscalling. If traffic spikes and some servers go down, the asynchronous method of message handling perserves consistency.
  2. Viral Tweet - a fast growing tweet will be added to the popular tweets cache at most a minute after passing the threshold. In practice, the message queue makes tweets slower to grow than the users may expect.

# Tradeoffs

  1. Message queues and caches are expensive, especially at scale.
  2. Could have improved performance by sacrificing consistency by saving like count at the tweet table (incrementing on like, decrementing on dislike)