My Solution for Design Twitter with Score: 8/10
by iridescent_luminous693
System requirements
Functional Requirements
- User Management
- Users can sign up for an account using their email, phone number, or social media credentials.
- Users can log in and log out securely.
- Users can edit their profile, including profile picture, bio, and other personal information.
- Users can deactivate or delete their accounts.
- Tweet Management
- Users can compose and post tweets with a character limit (e.g., 280 characters).
- Users can attach media such as images, videos, or GIFs to their tweets.
- Tweets can include hashtags and mentions of other users.
- Users can edit or delete their tweets within a certain time frame after posting.
- Feed and Timeline
- Users can view a personalized feed containing tweets from the accounts they follow.
- Tweets are displayed in reverse chronological order or based on relevancy (algorithmic ranking).
- Users can view trending topics and hashtags globally or regionally.
- Follow System
- Users can follow and unfollow other users.
- Users can view the list of their followers and the accounts they follow.
- Users can block or mute other users to control their interactions.
- Engagement Features
- Users can like (favorite) tweets to show appreciation.
- Users can retweet with or without comments to share content with their followers.
- Users can reply to tweets to start or participate in conversations.
- Users can bookmark tweets for later reference.
- Notification System
- Users receive notifications for likes, retweets, replies, and new followers.
- Users can customize notification preferences (e.g., email, push notifications).
- Search and Discovery
- Users can search for tweets, hashtags, or accounts.
- Users can explore trending topics, suggested accounts, and curated content.
- Privacy and Security
- Users can choose to make their profiles public or private.
- Private accounts require approval for new followers.
- Sensitive content can be flagged, and users can report inappropriate tweets or accounts.
- Analytics and Insights
- Users can view analytics for their tweets, such as impressions, engagements, and profile visits.
- Users can monitor follower growth over time.
- Third-Party Integrations
- Integration with third-party apps for cross-posting (e.g., sharing tweets on Facebook or Instagram).
- API access for developers to build applications on top of the platform.
Non-Functional Requirements
- Scalability
- The system should handle millions of concurrent users, tweets, and engagements without performance degradation.
- Performance
- The platform should ensure low latency for posting tweets, loading feeds, and performing searches.
- The system should provide real-time updates for notifications and feeds.
- Availability
- The platform should be available 99.9% of the time, with minimal downtime for maintenance.
- Security
- User data must be encrypted both in transit and at rest.
- The system must include measures to prevent unauthorized access, data breaches, and DDoS attacks.
- Data Consistency
- Tweets, likes, and follows must be consistently updated across all users' feeds and profiles.
- Reliability
- The system must ensure reliable delivery of notifications and updates, even under high loads.
- Usability
- The platform should be user-friendly, with intuitive navigation and accessible interfaces for users with disabilities.
- Maintainability
- The codebase should be modular and maintainable, allowing for easy updates and feature additions.
- Compliance
- The platform must comply with data protection regulations such as GDPR and CCPA.
- Content moderation policies should adhere to regional and global legal requirements.
- Localization
- The system should support multiple languages and regional settings for global accessibility.
- Disaster Recovery
- The platform should have mechanisms for data backups and disaster recovery to ensure business continuity during unexpected failures.
Capacity estimation
1. Assumptions
Before performing capacity estimation, define the assumptions for system usage:
- User Base: 100 million active users.
- Daily Active Users (DAU): 10% of total users (10 million).
- Average Tweets per Day per User: 5 tweets.
- Average Tweet Size: 280 characters (≈300 bytes, including metadata).
- Average Likes per Tweet: 10 likes.
- Average Retweets per Tweet: 2 retweets.
- Feed Size: 200 tweets displayed per user.
- Storage Retention Period: Tweets retained indefinitely.
2. Traffic Estimation
a. Tweet Traffic
- Tweets per Day: 10M×5=50M10M \times 5 = 50M10M×5=50M tweets/day.
- Tweet Traffic per Second: 50M86400≈578 tweets/second.\frac{50M}{86400} \approx 578 \, \text{tweets/second}.8640050M≈578tweets/second.
b. Like Traffic
- Likes per Day: 50M×10=500M50M \times 10 = 500M50M×10=500M likes/day.
- Like Traffic per Second: 500M86400≈5787 likes/second.\frac{500M}{86400} \approx 5787 \, \text{likes/second}.86400500M≈5787likes/second.
c. Retweet Traffic
- Retweets per Day: 50M×2=100M50M \times 2 = 100M50M×2=100M retweets/day.
- Retweet Traffic per Second: 100M86400≈1157 retweets/second.\frac{100M}{86400} \approx 1157 \, \text{retweets/second}.86400100M≈1157retweets/second.
d. Feed Updates
- Feeds Updated per Tweet: Assume each tweet appears in the feeds of 100 followers (on average): 50M×100=5B feed updates/day.50M \times 100 = 5B \, \text{feed updates/day}.50M×100=5Bfeed updates/day.
- Feed Updates per Second: 5B86400≈57,870 updates/second.\frac{5B}{86400} \approx 57,870 \, \text{updates/second}.864005B≈57,870updates/second.
3. Storage Requirements
a. Tweet Storage
- Daily Storage for Tweets: 50M×300 bytes≈15 GB/day.50M \times 300 \, \text{bytes} \approx 15 \, \text{GB/day}.50M×300bytes≈15GB/day.
- Yearly Storage for Tweets: 15 GB/day×365≈5.5 TB/year.15 \, \text{GB/day} \times 365 \approx 5.5 \, \text{TB/year}.15GB/day×365≈5.5TB/year.
b. Metadata Storage
Metadata includes likes, retweets, and user references:
- Average Metadata per Tweet: 1 KB (including likes and retweets).
- Daily Metadata Storage: 50M×1 KB=50 GB/day.50M \times 1 \, \text{KB} = 50 \, \text{GB/day}.50M×1KB=50GB/day.
- Yearly Metadata Storage: 50 GB/day×365≈18 TB/year.50 \, \text{GB/day} \times 365 \approx 18 \, \text{TB/year}.50GB/day×365≈18TB/year.
c. Total Storage Requirement
- Yearly Storage (tweets + metadata): 5.5 TB+18 TB=23.5 TB/year.5.5 \, \text{TB} + 18 \, \text{TB} = 23.5 \, \text{TB/year}.5.5TB+18TB=23.5TB/year.
4. Database Throughput
- Write Operations per Second:
- Tweets: 578/sec.
- Likes: 5787/sec.
- Retweets: 1157/sec.
- Total: 578+5787+1157≈7522 writes/second.578 + 5787 + 1157 \approx 7522 \, \text{writes/second}.578+5787+1157≈7522writes/second.
- Read Operations per Second:
- Feed Updates: 57,870/sec.
- Assume 50% of DAU refresh their feed 10 times/day: 10M×10×200=20B tweets fetched/day.10M \times 10 \times 200 = 20B \, \text{tweets fetched/day}.10M×10×200=20Btweets fetched/day. 20B86400≈231,481 reads/second.\frac{20B}{86400} \approx 231,481 \, \text{reads/second}.8640020B≈231,481reads/second.
5. Bandwidth Estimation
a. Incoming Traffic
- Tweet Upload: 50M×300 bytes=15 GB/day.50M \times 300 \, \text{bytes} = 15 \, \text{GB/day}.50M×300bytes=15GB/day.
- Likes and Retweets: Assume 50 bytes/operation: (500M+100M)×50=30 GB/day.(500M + 100M) \times 50 = 30 \, \text{GB/day}.(500M+100M)×50=30GB/day.
- Total Incoming Traffic: 15 GB+30 GB=45 GB/day.15 \, \text{GB} + 30 \, \text{GB} = 45 \, \text{GB/day}.15GB+30GB=45GB/day.
b. Outgoing Traffic
- Feed Delivery: Assuming each tweet delivered to 100 users and compressed to 1 KB: 5B×1 KB=5 TB/day.5B \times 1 \, \text{KB} = 5 \, \text{TB/day}.5B×1KB=5TB/day.
API design
1. User Management APIs
- POST /signup
- Description: Register a new user account.
- Parameters:
username
,email
,password
,phoneNumber
. - Response: Success or error message.
- POST /login
- Description: Authenticate and log in a user.
- Parameters:
username/email
,password
. - Response: Authentication token.
- POST /logout
- Description: Log out the user.
- Parameters:
authToken
. - Response: Success or error message.
- PUT /profile
- Description: Update user profile information.
- Parameters:
authToken
,bio
,profilePicture
,location
. - Response: Updated profile details.
- GET /profile/{username}
- Description: Fetch user profile details.
- Parameters:
username
. - Response: User profile data.
- DELETE /account
- Description: Deactivate or delete a user account.
- Parameters:
authToken
. - Response: Success or error message.
2. Tweet Management APIs
- POST /tweet
- Description: Post a new tweet.
- Parameters:
authToken
,content
,media[]
. - Response: Tweet ID and status.
- GET /tweet/{tweetId}
- Description: Fetch a specific tweet.
- Parameters:
tweetId
. - Response: Tweet content and metadata.
- DELETE /tweet/{tweetId}
- Description: Delete a tweet.
- Parameters:
authToken
,tweetId
. - Response: Success or error message.
- PUT /tweet/{tweetId}
- Description: Edit an existing tweet (if allowed).
- Parameters:
authToken
,content
. - Response: Updated tweet details.
3. Engagement APIs
- POST /like
- Description: Like a tweet.
- Parameters:
authToken
,tweetId
. - Response: Success or error message.
- DELETE /like
- Description: Unlike a tweet.
- Parameters:
authToken
,tweetId
. - Response: Success or error message.
- POST /retweet
- Description: Retweet a tweet.
- Parameters:
authToken
,tweetId
,comment
(optional). - Response: Success or error message.
- POST /reply
- Description: Reply to a tweet.
- Parameters:
authToken
,tweetId
,content
. - Response: Success or error message.
- POST /bookmark
- Description: Bookmark a tweet for later reference.
- Parameters:
authToken
,tweetId
. - Response: Success or error message.
4. Feed APIs
- GET /feed
- Description: Fetch the user's feed.
- Parameters:
authToken
,paginationToken
. - Response: List of tweets.
- GET /trending
- Description: Fetch trending hashtags and topics.
- Parameters:
region
(optional). - Response: List of trending topics.
- GET /user/{username}/tweets
- Description: Fetch a specific user's tweets.
- Parameters:
username
,paginationToken
. - Response: List of tweets.
5. Follow System APIs
- POST /follow
- Description: Follow a user.
- Parameters:
authToken
,username
. - Response: Success or error message.
- DELETE /follow
- Description: Unfollow a user.
- Parameters:
authToken
,username
. - Response: Success or error message.
- GET /followers/{username}
- Description: Fetch a user's followers.
- Parameters:
username
,paginationToken
. - Response: List of followers.
- GET /following/{username}
- Description: Fetch accounts a user is following.
- Parameters:
username
,paginationToken
. - Response: List of accounts.
6. Notification APIs
- GET /notifications
- Description: Fetch user notifications.
- Parameters:
authToken
,paginationToken
. - Response: List of notifications.
- PUT /notifications/settings
- Description: Update notification preferences.
- Parameters:
authToken
,emailNotifications
,pushNotifications
. - Response: Success or error message.
7. Search and Discovery APIs
- GET /search
- Description: Search for tweets, hashtags, or accounts.
- Parameters:
query
,type
(tweets/users/hashtags),paginationToken
. - Response: Search results.
- GET /explore
- Description: Fetch recommended topics and accounts.
- Parameters:
authToken
. - Response: List of recommendations.
8. Admin and Moderation APIs
- POST /report
- Description: Report a tweet or user.
- Parameters:
authToken
,targetId
,reason
. - Response: Success or error message.
- GET /moderation/reports
- Description: Fetch reported content for review (admin use).
- Parameters:
authToken
,paginationToken
. - Response: List of reports.
- DELETE /moderation/tweet/{tweetId}
- Description: Remove a reported tweet (admin use).
- Parameters:
authToken
. - Response: Success or error message.
9. Analytics APIs
- GET /analytics/tweet/{tweetId}
- Description: Fetch analytics for a specific tweet.
- Parameters:
authToken
,tweetId
. - Response: Impressions, likes, retweets, etc.
- GET /analytics/profile
- Description: Fetch profile analytics.
- Parameters:
authToken
. - Response: Profile views, follower growth, etc.
Database design
1. User Database
- Tech Choice: Relational Database (e.g., PostgreSQL or MySQL)
- Attributes:
- User ID (Primary Key)
- Username (Unique)
- Email (Unique)
- Password Hash
- Profile Picture URL
- Bio
- Location
- Created At (Timestamp)
- Updated At (Timestamp)
- Reason:
- Consistency: Ensures that data integrity is maintained (e.g., unique usernames and emails).
- Structured Data: User information is structured and fits well in a relational model.
- Transactions: Supports atomic operations, ensuring robust account creation and updates.
- Wide Adoption: Mature, well-documented, and widely used in the industry.
2. Tweet Database
- Tech Choice: NoSQL Database (e.g., DynamoDB or MongoDB)
- Attributes:
- Tweet ID (Primary Key)
- User ID (Foreign Key to User Database)
- Content (Text with a character limit)
- Media URL (Optional, for attachments like images or videos)
- Created At (Timestamp)
- Updated At (Timestamp)
- Retweet Count
- Like Count
- Metadata Attributes:
- Tweet ID (Primary Key)
- Hashtags (List/Array)
- Mentions (List/Array)
- Is Retweet (Boolean)
- Reason:
- Write Scalability: NoSQL databases handle high write volumes efficiently.
- Flexible Schema: Metadata like hashtags and mentions can vary in structure, which is better handled by NoSQL.
- Horizontal Scaling: Can easily scale to handle millions of tweets per day.
- Low Latency: Ensures quick responses for tweet creation and updates.
3. Feed Database
- Tech Choice: In-Memory Store (e.g., Redis or Memcached)
- Attributes:
- User ID (Primary Key)
- List of Tweet IDs (List/Array of tweet references for the user's feed)
- Last Updated (Timestamp)
- Reason:
- Low Latency: In-memory stores are extremely fast and ideal for caching frequently accessed data like user feeds.
- Real-Time Updates: Supports real-time delivery of feeds with minimal delay.
- Efficient Retrieval: Enables quick retrieval of precomputed feeds without heavy database queries.
- Tradeoff: Requires backup mechanisms to prevent data loss due to memory volatility.
4. Engagement Database
- Tech Choice: Relational Database (e.g., PostgreSQL or MySQL) or NoSQL Database (e.g., Cassandra or DynamoDB)
- Attributes for Likes:
- Like ID (Primary Key)
- Tweet ID (Foreign Key to Tweet Database)
- User ID (Foreign Key to User Database)
- Created At (Timestamp)
- Attributes for Retweets:
- Retweet ID (Primary Key)
- Original Tweet ID (Foreign Key to Tweet Database)
- User ID (Foreign Key to User Database)
- Created At (Timestamp)
- Reason:
- Relational Choice:
- Strong relationships between users, tweets, and engagements can be easily managed.
- Simplifies querying for engagement reports and analytics.
- NoSQL Choice:
- Preferred for massive write operations (e.g., likes and retweets at scale).
- Scales horizontally for high-volume engagement data.
- Relational Choice:
5. Search Index Database
- Tech Choice: Search Engine (e.g., Elasticsearch or Apache Solr)
- Attributes:
- Index ID (Primary Key)
- Keywords (Indexed text for search terms)
- Associated Data ID (References tweet IDs, user IDs, or hashtags)
- Type (Tweet, User, or Hashtag)
- Reason:
- Optimized for Search: Specifically designed for full-text search with features like relevance ranking and autocomplete.
- High Scalability: Handles indexing and searching large datasets effectively.
- Custom Query Support: Allows advanced queries like keyword searches, phrase matches, and filtering by type or date.
6. Analytics Database
- Tech Choice: Columnar Database (e.g., Amazon Redshift, Google BigQuery, or Apache Druid)
- Attributes:
- Event ID (Primary Key)
- User ID
- Event Type (e.g., Tweet Posted, Liked, Retweeted)
- Timestamp
- Metadata (JSON for additional details)
- Reason:
- Optimized for Analytics: Columnar databases are designed for analytical queries with fast aggregation and reporting.
- Scalable: Can process large-scale data generated by user interactions.
- Low Query Latency: Ideal for dashboards and real-time analytics.
High-level design
1. API Gateway
- Purpose:
- Acts as a single entry point for all client requests (web, mobile, etc.).
- Handles authentication, rate limiting, and request routing to the appropriate backend services.
- Responsibilities:
- Authenticate requests using tokens.
- Validate incoming requests and route them to microservices.
- Aggregate responses from multiple services if required.
- Tech Choices: AWS API Gateway, Kong Gateway, or Nginx.
2. Authentication and Authorization Service
- Purpose:
- Manages user login, signup, and token-based authentication.
- Ensures secure access to resources using OAuth 2.0 or JWT.
- Responsibilities:
- Generate and validate access/refresh tokens.
- Handle session management.
- Support social login (e.g., Google, Facebook).
- Tech Choices: OAuth Server, Keycloak, Auth0, or custom service with JWT.
3. User Management Service
- Purpose:
- Handles user account creation, profile management, and user-specific settings.
- Responsibilities:
- Store and update user profile information (e.g., bio, profile picture, email).
- Manage account settings (e.g., privacy preferences).
- Provide APIs for user lookups and updates.
- Tech Choices: Relational Database (e.g., PostgreSQL or MySQL).
4. Tweet Management Service
- Purpose:
- Manages the creation, editing, and deletion of tweets, along with associated metadata.
- Responsibilities:
- Store and retrieve tweets.
- Handle tweet-related metadata like hashtags, mentions, and media attachments.
- Provide APIs for users to post, edit, delete, and fetch tweets.
- Tech Choices: NoSQL Database (e.g., MongoDB, DynamoDB).
5. Feed Generation and Delivery Service
- Purpose:
- Generates and delivers personalized feeds for users based on their subscriptions and engagement.
- Responsibilities:
- Fetch tweets from users a person follows.
- Sort and rank tweets based on relevance or chronological order.
- Cache feeds for quick delivery and minimize recomputation.
- Tech Choices:
- In-Memory Databases (e.g., Redis or Memcached) for caching feeds.
- Stream Processing Tools (e.g., Apache Kafka or Amazon Kinesis) for real-time feed updates.
6. Engagement Service
- Purpose:
- Manages interactions with tweets, such as likes, retweets, and replies.
- Responsibilities:
- Store and retrieve engagement data (e.g., likes, retweets, replies).
- Update engagement counters in real time.
- Trigger notifications for user interactions.
- Tech Choices:
- Relational Database (e.g., PostgreSQL or MySQL) for structured engagement data.
- NoSQL Database (e.g., Cassandra) for high-volume writes.
7. Search and Discovery Service
- Purpose:
- Provides full-text search and discovery of tweets, hashtags, and user profiles.
- Responsibilities:
- Index tweets, hashtags, and profiles for fast lookups.
- Handle advanced search queries (e.g., by keywords, hashtags, or users).
- Support trending topics and content discovery.
- Tech Choices: Elasticsearch, Apache Solr, or AWS OpenSearch.
8. Notification Service
- Purpose:
- Delivers real-time or batch notifications for user interactions (e.g., likes, follows, replies).
- Responsibilities:
- Track events like new followers, mentions, or replies.
- Push notifications via email, mobile, or web (e.g., push notifications, SMS).
- Manage user preferences for notification delivery.
- Tech Choices:
- Message Queues (e.g., RabbitMQ, Kafka) for event-driven notifications.
- Push notification platforms like Firebase Cloud Messaging (FCM).
9. Media Storage Service
- Purpose:
- Manages storage and retrieval of media files (e.g., images, videos, GIFs).
- Responsibilities:
- Handle upload, processing, and retrieval of media content.
- Optimize media for delivery (e.g., thumbnails, compression).
- Ensure secure access with temporary URLs.
- Tech Choices: Cloud Object Storage (e.g., AWS S3, Google Cloud Storage).
10. Analytics and Monitoring Service
- Purpose:
- Tracks user behavior, engagement metrics, and system health for analytics and insights.
- Responsibilities:
- Collect data on tweet impressions, engagement rates, and user activity.
- Generate reports for users and administrators.
- Monitor system performance and detect anomalies.
- Tech Choices:
- Columnar Databases (e.g., Amazon Redshift, Google BigQuery) for analytics.
- Monitoring tools like Prometheus, Grafana, or Datadog.
11. Search Indexing and Trending Service
- Purpose:
- Indexes tweets and tracks trending hashtags/topics.
- Responsibilities:
- Continuously update indexes for new tweets, hashtags, and user profiles.
- Analyze usage patterns to determine trending topics in real time.
- Tech Choices:
- Elasticsearch or Apache Solr for indexing.
- Stream Processing Tools for real-time trend detection.
12. Admin and Moderation Service
- Purpose:
- Provides tools for administrators to manage and moderate platform content.
- Responsibilities:
- Handle content reports, block abusive accounts, and remove inappropriate tweets.
- Generate dashboards for admin insights.
- Support compliance with legal requirements.
- Tech Choices:
- Custom admin dashboards with a relational database backend.
13. CDN (Content Delivery Network)
- Purpose:
- Ensures fast and reliable delivery of static assets like images, videos, and JavaScript files.
- Responsibilities:
- Cache static content closer to the user’s location.
- Reduce load on backend services by offloading media delivery.
- Tech Choices: AWS CloudFront, Akamai, or Cloudflare.
14. Logging and Auditing Service
- Purpose:
- Tracks all system activities for debugging, compliance, and auditing.
- Responsibilities:
- Log API calls, user activities, and system errors.
- Enable forensic investigations for security breaches.
- Tech Choices:
- Logging frameworks like ELK Stack (Elasticsearch, Logstash, Kibana).
- Distributed tracing tools like Jaeger or Zipkin.
15. Infrastructure and Orchestration
- Purpose:
- Manages deployment, scaling, and orchestration of microservices.
- Responsibilities:
- Deploy and manage containers (e.g., Docker).
- Scale services dynamically based on load.
- Orchestrate service communication and failover.
- Tech Choices:
- Kubernetes for container orchestration.
- Terraform or CloudFormation for infrastructure as code (IaC).
16. Backup and Disaster Recovery
- Purpose:
- Ensures data is backed up and recoverable in case of failures.
- Responsibilities:
- Schedule regular backups for user data, tweets, and engagement data.
- Maintain disaster recovery plans for high availability.
- Tech Choices:
- Cloud-native tools like AWS Backup or Azure Backup.
Request flows
1. Posting a Tweet
Step-by-Step Flow:
- Client Request:
- User composes a tweet and clicks "Post."
- The client sends an HTTP POST request to the API Gateway with the tweet content and media (if any).
- API Gateway:
- Validates the request.
- Verifies the authentication token using the Authentication Service.
- Routes the request to the Tweet Management Service.
- Tweet Management Service:
- Validates the content (e.g., character limit, profanity check).
- Stores the tweet and metadata (hashtags, mentions) in the Tweet Database.
- Publishes an event (e.g., "New Tweet Posted") to a message queue (e.g., Kafka).
- Feed Generation Service:
- Consumes the "New Tweet Posted" event.
- Updates the feeds of the user’s followers in the Feed Database (e.g., Redis).
- Search Indexing Service:
- Consumes the "New Tweet Posted" event.
- Indexes the tweet content for search functionality in Search Engine (e.g., Elasticsearch).
- Response:
- The Tweet Management Service responds to the API Gateway with success status.
- The API Gateway forwards the response to the client, confirming the tweet was posted.
2. Viewing the User Feed
Step-by-Step Flow:
- Client Request:
- User opens their feed on the app or web.
- The client sends an HTTP GET request to the API Gateway to retrieve the feed.
- API Gateway:
- Validates the request and verifies the authentication token using the Authentication Service.
- Routes the request to the Feed Generation Service.
- Feed Generation Service:
- Fetches the precomputed feed from the Feed Database (e.g., Redis).
- If the feed is stale or missing, recalculates it by querying the Tweet Database and metadata.
- Media Storage Service:
- If the tweets contain media, generates secure URLs (e.g., pre-signed S3 URLs) for client access.
- Response:
- The Feed Generation Service responds to the API Gateway with the feed data.
- The API Gateway forwards the feed to the client for rendering.
3. Liking a Tweet
Step-by-Step Flow:
- Client Request:
- User clicks the "Like" button on a tweet.
- The client sends an HTTP POST request to the API Gateway with the tweet ID.
- API Gateway:
- Validates the request and verifies the authentication token using the Authentication Service.
- Routes the request to the Engagement Service.
- Engagement Service:
- Validates the tweet ID.
- Updates the Engagement Database to record the like.
- Updates the like counter in the Tweet Database.
- Publishes an event (e.g., "Tweet Liked") to a message queue (e.g., Kafka).
- Notification Service:
- Consumes the "Tweet Liked" event.
- Sends a notification to the tweet’s author (via mobile push, email, or web) using the Notification Service.
- Response:
- The Engagement Service responds to the API Gateway with a success status.
- The API Gateway forwards the response to the client.
4. Searching for Tweets
Step-by-Step Flow:
- Client Request:
- User enters a search term and submits it.
- The client sends an HTTP GET request to the API Gateway with the query string.
- API Gateway:
- Validates the request and forwards it to the Search Service.
- Search Service:
- Queries the Search Index (e.g., Elasticsearch) for matching tweets, users, or hashtags.
- Applies filters (e.g., date range, user relevance) to the results.
- Response:
- The Search Service responds to the API Gateway with the search results.
- The API Gateway forwards the results to the client for rendering.
5. Receiving Notifications
Step-by-Step Flow:
- Trigger:
- An event (e.g., a user likes a tweet) is published to a message queue (e.g., Kafka).
- Notification Service:
- Consumes the event and identifies the recipient(s).
- Generates the notification content (e.g., "User X liked your tweet").
- Stores the notification in the Notification Database.
- Sends the notification via appropriate channels (push, email, or SMS).
- Client Request:
- The client periodically fetches notifications with an HTTP GET request to the API Gateway.
- API Gateway:
- Validates the request and forwards it to the Notification Service.
- Response:
- The Notification Service returns the list of notifications to the API Gateway.
- The API Gateway forwards the notifications to the client.
6. Following a User
Step-by-Step Flow:
- Client Request:
- User clicks "Follow" on another user’s profile.
- The client sends an HTTP POST request to the API Gateway with the target user ID.
- API Gateway:
- Validates the request and verifies the authentication token using the Authentication Service.
- Routes the request to the Follow Service.
- Follow Service:
- Updates the User Database to record the new follower relationship.
- Publishes an event (e.g., "User Followed") to a message queue (e.g., Kafka).
- Feed Generation Service:
- Consumes the "User Followed" event.
- Updates the new follower’s feed to include tweets from the followed user.
- Notification Service:
- Sends a notification to the followed user (e.g., "User X started following you").
- Response:
- The Follow Service responds to the API Gateway with success status.
- The API Gateway forwards the response to the client.
Detailed component design
1. Authentication and Authorization Service
End-to-End Working:
- A user logs in or signs up, sending their credentials (e.g., email and password) or social login token to the API Gateway.
- The Authentication Service verifies credentials or validates the social login token via OAuth/OpenID providers.
- On successful authentication, the service generates a secure token (e.g., JWT or OAuth access token) and a refresh token.
- Tokens are sent back to the client and stored securely in their session.
- The service also validates incoming tokens for every subsequent request.
Data Structures and Algorithms:
- Hash Maps: For session management and mapping user IDs to tokens for fast lookup.
- JWT Token Algorithms: Uses algorithms like HMAC SHA-256 or RSA-256 for token signing and verification.
- Password Hashing: Uses secure algorithms like bcrypt or Argon2 for password storage.
Scaling for Peak Traffic:
- Token Validation Caching: Frequently validated tokens are cached in Redis or Memcached to reduce cryptographic overhead.
- Horizontal Scaling: Stateless services allow replication behind a load balancer.
- Offloading Social Logins: Relies on external identity providers (e.g., Google, Facebook) for scaling social login traffic.
Edge Cases:
- Token Replay Attacks: Ensured by using unique session IDs embedded in tokens and validating them against the server state.
- Expired Tokens: Refresh tokens handle re-authentication seamlessly without requiring the user to log in again.
- Credential Stuffing: Rate-limiting and CAPTCHA mechanisms prevent automated login attempts.
2. User Management Service
End-to-End Working:
- Handles user account creation, profile updates, and preferences.
- When a request to view or edit a profile is received, the service queries the User Database and returns/update the data.
- For sensitive updates (e.g., password changes), it verifies the user identity via authentication tokens.
Data Structures and Algorithms:
- Relational Tables: User data is stored in normalized tables with indexes for fast lookup.
- Hash Maps: For caching frequently accessed user profiles to reduce database queries.
Scaling for Peak Traffic:
- Caching: Frequently accessed profiles are cached in memory (e.g., Redis).
- Write Queueing: Updates to user profiles are queued (e.g., Kafka) and processed asynchronously to handle spikes.
- Database Sharding: User data is partitioned by user ID to distribute load across multiple servers.
Edge Cases:
- Duplicate Usernames/Emails: Enforced uniqueness via database constraints and validations.
- Concurrent Profile Updates: Use optimistic locking to prevent overwriting concurrent changes.
- Partial Updates: Validate all fields and use transactions to ensure atomicity.
3. Tweet Management Service
End-to-End Working:
- Users post tweets via the API Gateway, and the service stores the tweet content and metadata (hashtags, mentions) in the Tweet Database.
- For retrieval, the service fetches tweets either by ID, user, or related hashtags.
- It publishes events (e.g., "New Tweet Posted") to a message queue for downstream services like Feed Generation and Search Indexing.
Data Structures and Algorithms:
- NoSQL Key-Value Stores: Used for storing tweets with unique tweet IDs as keys.
- Inverted Index: Maintains metadata like hashtags and mentions for efficient retrieval.
- Event Queues: Kafka or RabbitMQ is used to propagate changes to other services.
Scaling for Peak Traffic:
- Write Optimization: Write-heavy workloads are distributed across NoSQL partitions.
- Asynchronous Updates: Metadata indexing and feed updates are processed asynchronously.
- Shard by User ID: Ensures even distribution of tweets across multiple servers.
Edge Cases:
- Duplicate Tweets: Deduplication checks on the client or via hashing tweet content.
- Failed Media Uploads: Retry logic and pre-signed URLs ensure seamless media uploads.
- Hashtag Explosion: Rate-limit hashtags to prevent spam or abuse.
4. Feed Generation and Delivery Service
End-to-End Working:
- On tweet creation, the service updates the feeds of all followers by appending the new tweet ID to their feed list.
- When a user requests their feed, it retrieves the precomputed feed from the Feed Cache.
- If the feed is missing, it recomputes it using tweets from the user’s follow graph.
Data Structures and Algorithms:
- Priority Queues: To rank tweets by relevance or timestamp.
- Graph Structures: Used to model the follow relationships and fetch tweets efficiently.
- Caching: Redis is used to store precomputed feeds for quick delivery.
Scaling for Peak Traffic:
- Event-Driven Architecture: Uses Kafka for real-time updates to feeds.
- Batch Updates: Processes bulk feed updates during high traffic to reduce contention.
- Distributed Caching: Replicates feed data across regions to reduce latency.
Edge Cases:
- Feed Staleness: Periodically refresh cached feeds to ensure relevance.
- Follower Surges: Queue feed updates to prevent overwhelming the system.
- Broken Follow Links: Validate follow relationships during feed recomputation.
5. Search and Discovery Service
End-to-End Working:
- Handles search queries from users, fetching tweets, hashtags, or profiles based on keywords.
- Uses a Search Index to retrieve matching results and applies filters for relevance and recency.
- Updates the index in real-time when new tweets or hashtags are created.
Data Structures and Algorithms:
- Inverted Index: Maps keywords to tweet IDs or user profiles for fast retrieval.
- TF-IDF: Calculates relevance scores for search results.
- Pagination Structures: Helps in paginating search results efficiently.
Scaling for Peak Traffic:
- Index Sharding: Distributes the search index across multiple nodes for parallel queries.
- Asynchronous Indexing: Processes updates to the search index in batches to reduce contention.
- Load Balancing: Distributes search traffic across multiple query nodes.
Edge Cases:
- Ambiguous Queries: Uses fuzzy matching and autocomplete to improve user experience.
- Large Results: Implements result capping and pagination to handle long searches.
- Trending Explosions: Caps results for highly trending hashtags to maintain system performance.
6. Notification Service
End-to-End Working:
- Listens to events (e.g., likes, retweets) from the message queue.
- Creates personalized notification messages and stores them in the Notification Database.
- Pushes notifications to users via email, SMS, or push notification services.
Data Structures and Algorithms:
- Message Queues: Kafka or RabbitMQ for event-driven notifications.
- Priority Queues: To handle urgent notifications (e.g., mentions).
- TTL Caches: For expiring notifications that are time-sensitive.
Scaling for Peak Traffic:
- Asynchronous Processing: Notification generation is decoupled from user interactions.
- Batch Delivery: Groups notifications to reduce the number of API calls to third-party services.
- Regional Replication: Deploys notification services closer to users to reduce latency.
Edge Cases:
- Notification Spam: Implements rate limiting per user to avoid overwhelming them.
- Missed Notifications: Ensures retries for failed notifications.
- Opt-Out Preferences: Adheres to user preferences for notification delivery.
Trade offs/Tech choices
Relational vs. NoSQL Databases:
- Trade-Off: NoSQL databases (e.g., MongoDB, DynamoDB) are used for tweet storage due to high write throughput and flexible schema, but they sacrifice complex query support and strong consistency.
- Reason: Tweets and metadata are write-heavy and don’t always require strict ACID compliance, making NoSQL a better fit.
Caching for Performance:
- Trade-Off: Redis/Memcached is used for caching feeds and frequently accessed data. While it provides low latency, data persistence is not guaranteed.
- Reason: The trade-off in durability is acceptable for temporary data like feeds since it can be recomputed.
Event-Driven Architecture:
- Trade-Off: Using Kafka for asynchronous updates introduces eventual consistency in feed generation and notifications.
- Reason: Decoupling services ensures scalability and handles real-time updates without overloading primary databases.
Search Engine vs. Database for Search:
- Trade-Off: Elasticsearch is used for search instead of querying the primary database directly, requiring additional infrastructure.
- Reason: Optimized for full-text search, Elasticsearch provides faster and more relevant results for queries.
In-Memory Processing:
- Trade-Off: In-memory services like Redis handle high-speed operations but are costly in terms of memory usage.
- Reason: The speed improvement justifies the cost, especially for latency-sensitive operations like feed retrieval.
Horizontal Scaling Over Vertical Scaling:
- Trade-Off: Stateless microservices are scaled horizontally, which increases deployment complexity.
- Reason: Horizontal scaling ensures fault tolerance and supports massive traffic spikes better than vertical scaling.
Failure scenarios/bottlenecks
Database Overload:
- Scenario: High write volumes (e.g., during viral events) overwhelm the Tweet or Engagement Database.
- Mitigation: Implement sharding, write queuing, and caching to reduce direct database load.
Cache Miss or Overload:
- Scenario: Cache (e.g., Redis) fails or experiences high traffic, leading to slower feed retrieval.
- Mitigation: Use fallback to recompute feeds and enable distributed caching for redundancy.
Search Index Lag:
- Scenario: Search indexing delays cause stale or missing search results.
- Mitigation: Use asynchronous batch indexing and monitor indexing pipelines for delays.
Event Queue Congestion:
- Scenario: Message queues (e.g., Kafka) become congested, delaying feed updates or notifications.
- Mitigation: Scale consumers dynamically and partition topics to handle parallel processing.
API Gateway Overload:
- Scenario: High incoming request rates exceed the API Gateway’s capacity.
- Mitigation: Deploy multiple gateway instances with load balancing and enforce rate limiting.
Notification Spamming:
- Scenario: Excessive notifications overwhelm users or third-party services (e.g., push/email providers).
- Mitigation: Implement user-specific rate limits and batch notification delivery.
Single-Region Failure:
- Scenario: Regional outages disrupt service for users in that region.
- Mitigation: Use multi-region deployments with failover mechanisms.
Feed Staleness:
- Scenario: Precomputed feeds become outdated during high activity.
- Mitigation: Implement periodic refresh jobs and prioritize real-time updates for active users.
Authentication Failures:
- Scenario: Token verification delays or service downtime prevent user login.
- Mitigation: Cache valid tokens and deploy authentication service replicas.
Media Storage Downtime:
- Scenario: Issues with cloud storage (e.g., S3) prevent media uploads or retrievals.
- Mitigation: Use redundant storage systems and pre-generate signed URLs for faster access.
Future improvements
Database Scalability:
- Improvement: Implement advanced partitioning strategies (e.g., by time or user ID) for databases.
- Mitigation: Distribute load more evenly to prevent database overload during viral events.
Enhanced Caching:
- Improvement: Use distributed caching with automated failover (e.g., Redis Cluster).
- Mitigation: Reduce the impact of cache failures and ensure high availability.
Search Index Resilience:
- Improvement: Introduce multi-node replication for Elasticsearch or similar search engines.
- Mitigation: Ensure search index availability and faster recovery during failures.
Queue Monitoring and Auto-Scaling:
- Improvement: Implement real-time monitoring and dynamic scaling for event queues.
- Mitigation: Prevent congestion by dynamically adjusting consumer resources.
API Gateway Optimization:
- Improvement: Use global load balancers (e.g., AWS Global Accelerator) for better request distribution.
- Mitigation: Reduce latency and ensure high availability during traffic surges.
Multi-Region Deployments:
- Improvement: Expand to active-active multi-region architecture with geo-replication.
- Mitigation: Minimize the impact of regional outages by rerouting traffic seamlessly.
Notification System Enhancements:
- Improvement: Implement user-configurable notification thresholds and prioritize delivery.
- Mitigation: Reduce spamming and load on third-party notification services.
Real-Time Feed Updates:
- Improvement: Use push-based feeds for active users and hybrid approaches for others.
- Mitigation: Maintain feed freshness without overloading the system.
Automated Failover for Media Storage:
- Improvement: Set up redundant cloud storage providers.
- Mitigation: Avoid downtime or media loss by switching to backup storage instantly.
Continuous Testing and Chaos Engineering:
- Improvement: Simulate failures and stress test the system regularly.
- Mitigation: Identify weaknesses early and ensure the system is robust under stress.