Requirements
Functional Requirements:
- Allow users to tweet messages up to 140 characters.
- Enable users to follow other users.
- Allow users to like tweets from other users.
- Display tweets from followed users in the home feed.
- Show top K popular tweets in the home feed based on likes and followers.
Non-Functional Requirements:
- High availability - 99.99 avail - users expect to always be able to read/write tweets
- Low latency - home feed reads should be <200ms p99. Tweet posting should feel instant (<500ms)
- Eventually consistent: Feed delivery can tolerate slight delays (seconds). Likes/follower counts can be eventually consistent.
- Partition tolerant: Given CAP - we choose AP (avail + partition tolerant) over CP (consistency) for feed reads
- Durability: Tweets, once posted, must never be lost.
API Design
Write path:
POST /v1/tweets
Body: {"user_id": string, "content": string (max 140 chars), "media_ids": [string]}
Response: {"tweet_id": string, "created_at": timestamp}
POST /v1/users/{user_id}/follow
Body: {"target_user_id": string}
Response: 204 No Content
Post /v1/tweets/{tweet_id}/like
Body: {"user_id": string}
Response: 204 No Content
Read path:
GET /v1/feed?user_id={id}&cursor={cursor}&limit=20
Response: {"tweets": [...], "next_cursor": string}
GET /v1/feed/trending?limit=K
Response: {"tweets": [...]}
High-Level Design
The system splits into two core paths:
- Write path (Tweet Ingestion): User posts a tweet -> stored in Tweets DB -> fan-out service pushes to followers' feed caches
- Read path (Feed Generation): User requests feed -> read from precomputed feed cache (for normal users) or pull-based merge (for celebrity followers).
Key components:
- Tweet service: Handles CRUD for tweets
- Fan-out Service: Pushes tweets to follower timelines (hybrid push/pull model)
- Feed service: serves the home timeline from pre-computed caches
- Social graph service: managed follow relationships
- Trending service: Computes top-k popular tweets using a streaming count algorithm
Detailed Component Design
1. Fan-out Service (Hybrid push/pull)
Problem: A celebrity with 50m followers posting a tweet cannot fan out to 50m feed caches synchronously
Solution: hybrid aproach
- Push (fan-out on write): For users with <10k followers, when they tweet, asynchronously write the tweet_id in each follower's feed cache (Redis sorted set, scored by timestamp)
- Pull (fan-out on read): For celebrities (>10k followers), do NOT fan-out. Instead, at read time, the Feed Service merges the user's precomputed feed with recent tweets from followed celebrities.
Data structure: Each user's feed is a Redis sorted set: feed: {user_id} -> {tweet_id: timestamp}. Capped at 800 entries.
Scaling: Fan-out workers are horizontally scaled behind a message queue (Kafka). Partitioned by target user_id for ordering guarantees.
2. Trending/Top-K Service
Problem: Compute the top K tweets globally based on a scoring function: score = likes * wl + retweets * w2 + recency_decay.
Solution:
- Use a streaming approximate count approach (Count-Min sketch -> Min Heap of size K)
- Every like/retweet event is published to Kafka -> consumed by the Trending Service.
- The service maintains a sliding window (e.g. last 24h) and a minheap of K tweets by score
- Results are cached in Redis wit ha TTL of 60s and served directly.
Tradeoff: Approximate counts are acceptable - trending doesn't need exact precision.
3. Social Graph Service
Storage: Follow relationships stored in a graph-optimized store (adj list in Cassandra or dedicated graph store like TAO)
Schema (Cassandra):
- Following: partition key = user_id, clustering key = followed_user_id
- Followers: Partition key = user_id, clustering key = follower_user_id
both tables maintained in a dual-write pattern for O(1) lookups in either direction. The followers table is critical for the fan-out service to know who to push to.
dual write - use a transactional outbox or event sourcing to ensure durability
Key Tradeoffs summary:
Decision: Hybrid fan-out: complexity vs handling celebrity problem
Redis sorted sets for feeds: memory cost vs read latency
Approximate trending (Count-Min Sketch) precision vs throughput
Eventually consistent likes/counts: Consistency vs Availability
Dual-write follower/following tables: Storage cost vs O(1) lookups in both direcitons