System requirements


Functional:

  1. Users can create their own user account and log in.
  2. Users can post tweets (text, images, videos).
  3. Users can follow other users.
  4. Users can view a feed of tweets from people they follow and them selfs.



Non-Functional:

  1. Low latency for posting and viewing tweets.
  2. High availability and fault tolerance.
  3. Scalability to handle growing user base and traffic.
  4. Data consistency. Eventual consistency.



Capacity estimation

Total users: 10 millions

DAU: 2 millions

Tweets per day:3

Read/Write ratio: 20:1

Average tweet size:

  • Text: 140 characters (assume 2 bytes per character = 280 bytes).
  • Image: 200 KB (assume 20% of tweets have images).
  • Video: 1 MB (assume 5% of tweets have videos).

Daily Tweets: 3 * 2millions = 6 millions

Daily Requests: 6 millions * 21 = 126 millions

QPS: 126 millions / 24 / 60 = 1458 query/second

Daily storage (text): 6 million tweets * 280 bytes/tweet = 1.68 GB

Daily storage (images): 6 million tweets * 20% * 200 KB/image = 240 GB

Daily storage (videos): 6 million tweets * 5% * 1 MB/video = 300 GB

Total daily storage: 1.68 GB + 240 GB + 300 GB = ~541.68 GB

Yearly storage: 541.68 GB/day * 365 days = ~197.5 TB

Average request size (estimate):

  • Text-only: 300 bytes (including headers, metadata)
  • With image: 200 KB + 300 bytes = ~200.3 KB
  • With video: 1 MB + 300 bytes = ~1 MB

Weighted average request size: (0.75 * 300 bytes) + (0.2 * 200.3 KB) + (0.05 * 1 MB) = ~90.3 KB

Peak bandwidth: 5833 RPS * 90.3 KB/request = ~526 MB/s






API design

Users

  • GET /users/{user_id}: Retrieve a user's profile.
  • POST /users: Create a new user account.
  • PUT /users/{user_id}: Update a user's profile.
  • DELETE /users/{user_id}: Delete a user account.

Tweets

  • POST /tweets: Create a new tweet.
  • GET /tweets/{tweet_id}: Retrieve a tweet.
  • DELETE /tweets/{tweet_id}: Delete a tweet (only by the author).
  • GET /users/{user_id}/tweets: Retrieve tweets posted by a specific user.
  • GET /me/tweets: Retrieve tweets for






Database design


Users:

  • user_id (INT, primary key, auto-incrementing)
  • username (VARCHAR, unique, not null)
  • email (VARCHAR, unique, not null)
  • password_hash (VARCHAR, not null)
  • bio (TEXT)
  • profile_picture_url (VARCHAR)
  • created_at (TIMESTAMP)
  • updated_at (TIMESTAMP)

Tweets:

  • tweet_id (INT, primary key, auto-incrementing)
  • user_id (INT, foreign key referencing Users)
  • content (TEXT, not null)
  • media_urls (JSON - to store an array of image/video URLs)
  • created_at (TIMESTAMP)

Relationships:

  • follower_id (INT, foreign key referencing Users)
  • following_id (INT, foreign key referencing Users)
  • created_at (TIMESTAMP)
  • (Consider a composite primary key using follower_id and following_id)





High-level design

Clients:

  • Web Application: Browser-based interface for accessing "Design Twiteer."
  • Mobile Applications: iOS and Android apps for mobile access.

Load Balancers:

  • Distribute incoming traffic across multiple servers to ensure high availability and prevent overload.

API Gateway:

  • Single entry point for all client requests.
  • Handles authentication, authorization, and routing of requests to appropriate microservices.

Microservices:

  • User Service: Manages user accounts, profiles, and relationships (following/followers).
  • Tweet Service: Handles tweet creation, storage, and retrieval.
  • Feed Service: Generates and delivers personalized feeds for each user.
  • Notification Service: Sends notifications for new tweets, mentions, etc.
  • Search Service: (Optional for now) Allows users to search for tweets.

Data Stores:

  • Relational Database (e.g., MySQL): Stores user data, relationships, and tweet metadata.
  • NoSQL Database (e.g., Cassandra): Stores tweet content for efficient retrieval and scalability.
  • Cache (e.g., Redis): Caches frequently accessed data like user profiles, popular tweets, and feed data.

Message Queue (e.g., Kafka):

  • Enables asynchronous communication between services.
  • Used for tasks like fanning out tweets to followers, generating feeds, and sending notifications.

Content Delivery Network (CDN):

  • Stores and delivers static content (images, videos) from geographically distributed servers to improve performance.





Request flows

Client Initiates Request: A user composes a new tweet with text and an optional image in the mobile app and presses the "Tweet" button.

Load Balancer Distributes: The mobile app sends the tweet request to the Load Balancer. The Load Balancer distributes the incoming request to one of the available API Gateway instances.

API Gateway Authentication and Routing: The API Gateway authenticates the user's request using their authentication token or credentials. Once authenticated, the API Gateway routes the request to the Tweet Service.

Tweet Service Processes Request: The Tweet Service receives the request and processes the tweet data. It stores the tweet content (text and image URL) in the NoSQL database (Cassandra) and relevant metadata (user ID, timestamp, etc.) in the relational database (MySQL).

Asynchronous Processing with Message Queue: The Tweet Service sends a message to the Message Queue (Kafka) to notify the system about the new tweet. This message is placed on a topic like "New Tweets."

Feed Service Updates Feeds: The Feed Service, subscribed to the "New Tweets" topic, receives the message about the new tweet. It retrieves the tweet data from the database and updates the feeds of the users who follow the author of the tweet. It may also update the cached feed data in Redis.

Notification Service (Optional): The Notification Service might also consume the message from the queue to send out notifications to followers, informing them about the new tweet.

Client Receives Response: Once the Tweet Service has successfully stored the tweet, it sends a success response back to the client via the API Gateway. The user sees their tweet posted on their profile and their followers' feeds.





Detailed component design

Tweet Service

  • Functionality:
    • Handles tweet creation, storage, and retrieval.
    • Validates tweet content (length, format, etc.).
    • Stores tweet data in the NoSQL database (Cassandra) and metadata in the relational database (MySQL).
    • Publishes messages to the message queue (Kafka) to notify other services about new tweets.
  • Scalability:
    • Horizontal Scaling: Can be scaled horizontally by adding more instances of the Tweet Service. Load balancers distribute requests across these instances.
    • Database Sharding: The NoSQL database (Cassandra) is designed for scalability and can be sharded to distribute data across multiple nodes.
  • Data Structures and Algorithms:
    • Data Storage: Tweets are stored as documents in Cassandra. Each document contains the tweet content, media URLs, and associated metadata.
    • Indexing: Cassandra allows efficient indexing on various fields, such as user ID and timestamp, for faster retrieval.
    • Consistency: Cassandra offers tunable consistency levels. For tweet creation, a "quorum" or "local quorum" consistency can be used to ensure data is replicated to multiple nodes.

Functionality:

  • Generates and delivers personalized feeds for each user.
  • Retrieves tweets from users a user follows, ordered by timestamp.
  • Utilizes a combination of pull and push mechanisms:
    • Pull: When a user refreshes their feed, the service retrieves the latest tweets from the database.
    • Push: When a new tweet is posted, the service receives a notification from the message queue and pushes the tweet to the relevant users' feeds.
  • Caches frequently accessed feed data in Redis to improve performance.

Scalability:

  • Horizontal Scaling: Can be scaled horizontally by adding more instances of the Feed Service.
  • Caching: Redis can be scaled by adding more nodes and using techniques like sharding.
  • Message Queue: Kafka is designed for high throughput and can handle a large volume of messages.

Data Structures and Algorithms:

  • Sorted Sets in Redis: User feeds can be stored as sorted sets in Redis, with the score being the tweet's timestamp. This allows efficient retrieval of tweets in chronological order.
  • Fan-out on Write: When a new tweet is posted, the message queue facilitates fanning out the tweet to all followers, allowing for parallel updates of multiple feeds.






Trade offs/Tech choices

Relational Database (MySQL) vs. NoSQL Database (for all data)

  • Trade-off: Strong consistency and data integrity (MySQL) vs. high scalability and flexible data model (NoSQL).
  • Choice: MySQL for user data and relationships, Cassandra for tweet content.
  • Rationale: MySQL ensures ACID properties for critical user data, while Cassandra provides the scalability and flexibility needed for storing and retrieving large volumes of tweet data.

Message Queue (Kafka) vs. Synchronous Communication

  • Trade-off: Increased complexity (message queue) vs. potential performance bottlenecks and tight coupling (synchronous communication).
  • Choice: Kafka for asynchronous communication.
  • Rationale: Kafka enables decoupled and scalable communication between services. It facilitates tasks like feed updates and notifications without blocking the main request flow.




Failure scenarios/bottlenecks

1. Traffic Overload

  • Scenario: Sudden spikes in user activity (e.g., during a major event) can overwhelm the system, leading to slow response times, errors, or even outages.
    • Potential Bottlenecks:Load Balancers: If not configured correctly or lacking capacity, load balancers can become overwhelmed, failing to distribute traffic effectively.
    • API Gateway: A bottleneck at the API Gateway can prevent requests from reaching the microservices.
    • Database: High read/write load can overload the database, leading to slow query performance.
    • Mitigation:Capacity Planning: Properly estimate capacity and scale resources (servers, database, etc.) to handle peak loads.
    • Rate Limiting: Implement rate limiting to control the number of requests per user or per IP address.
    • Caching: Cache frequently accessed data to reduce database load.
    • Auto-scaling: Configure auto-scaling to automatically add more resources when traffic increases.

Database Failures

Database failures can disrupt the availability and consistency of your service. Here's a breakdown of common failure scenarios:

  • Hardware Failures: Server crashes, disk failures, power outages can lead to database downtime.
  • Software Failures: Bugs in the database software, operating system, or related components can cause errors or crashes.
  • Network Failures: Network outages or connectivity issues can prevent access to the database.
  • Data Corruption: Software bugs, hardware issues, or malicious attacks can lead to data corruption, making the database unusable.
  • Overload: Excessive load (high volume of reads/writes) can overwhelm the database, leading to slow performance or unavailability.

Replication:

  • Leader-follower Replication: A leader database handles writes, and follower replicas handle reads. This improves read scalability and provides failover capability.
  • Multi-Leader Replication: Multiple Leader nodes handle writes, improving write scalability and availability.
  • Benefits: Increased read capacity, high availability, data backups.
  • Challenges: Increased complexity, potential for data inconsistency (especially with multi-leader).

Sharding:

  • Horizontal Partitioning: Dividing the database into smaller, independent shards (horizontal partitions). Each shard holds a subset of the data.
  • Sharding Key: A key (e.g., user ID) is used to determine which shard stores a particular piece of data.
  • Benefits: Improved scalability and performance, as data is distributed across multiple servers.
  • Challenges: Increased complexity in data distribution, query routing, and maintaining data consistency across shards.






Future improvements

Enhanced Feed Personalization:

  • Content-Based Filtering: Analyze tweet content (text, hashtags, media) to recommend tweets relevant to user interests.
  • Collaborative Filtering: Identify users with similar interests and recommend tweets they have interacted with.
  • Machine Learning: Train models to predict user preferences and personalize feed ranking based on engagement patterns.

Real-time Features:

  • Real-time Notifications: Use WebSockets to deliver instant notifications for new tweets, mentions, and followers.
  • Live Tweeting: Support live tweeting for events, enabling real-time updates and conversations.