Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Live Comment System

by nectar4678

System requirements

Functional:

User Authentication and Authorization:

Users must be able to register and log in to the system.
Users must be able to post comments only when authenticated.
Users should have different roles (e.g., admin, regular user) with appropriate permissions.

Posting Comments:

Users should be able to post comments on a specific post or feed.
Comments should be displayed in real-time to all users viewing the same post/feed.

Viewing Comments:

Users should be able to view all comments associated with a specific post/feed.
Comments should be updated in real-time as new comments are posted.

Notification System:

Users should receive notifications for new comments on posts/feeds they are viewing.

Comment Moderation:

Admins should be able to delete or edit inappropriate comments.
The system should support reporting of inappropriate comments by users.

Rate Limiting:

To prevent spam, there should be a mechanism to limit the rate at which users can post comments.

Non-Functional:

Scalability:

The system should handle high volumes of concurrent users and comments.
It should scale horizontally to accommodate growth in users and data.

Performance:

The system should provide low-latency updates to ensure real-time comment visibility.
It should handle large volumes of read and write operations efficiently.

Reliability:

The system should be highly available with minimal downtime.
It should provide mechanisms for data recovery in case of failures.

Security:

User data and comments should be securely stored and transmitted.
The system should protect against common vulnerabilities (e.g., SQL injection, XSS).

Usability:

The user interface should be intuitive and responsive.
Real-time updates should be seamless and not disrupt the user experience.

Maintainability:

The system should be modular and easy to maintain.
Code should be well-documented and follow best practices.

Capacity estimation

Assumptions

Each comment is approximately 200 bytes in size.
Users are evenly distributed across different posts/feeds.
The system uses efficient data structures and indexing to manage comments.
Network latency and throughput are optimal.

Storage Requirements

Comments Per Second: 10,000 comments per second.
Comment Size: 200 bytes.
Total data per second = 10,000 comments * 200 bytes = 2,000,000 bytes (2 MB).
Total data per day = 2 MB * 60 seconds * 60 minutes * 24 hours = 172,800 MB (approximately 173 GB).
Total data per month = 173 GB * 30 days = 5,190 GB (approximately 5.2 TB).

Database Throughput

Write Throughput:

Peak write throughput = 10,000 comments per second.
Assuming the write operation is the most intensive, the database must handle at least 10,000 writes per second.

Read Throughput:

With 1 million concurrent users, if each user fetches new comments every 5 seconds, the read operations per second would be: 1,000,000 / 5 = 200,000 read operations per second.

Network Bandwidth

Data Transmission for Writes:

Bandwidth for writes per second = 2 MB/s.

Data Transmission for Reads:

Assuming each read operation fetches an average of 10 comments (2000 bytes): Bandwidth for reads per second = 200,000 reads * 2000 bytes = 400,000,000 bytes (400 MB/s).

Total Bandwidth Requirement:

Total bandwidth = 2 MB/s (write) + 400 MB/s (read) = 402 MB/s.

Server Requirements

Application Servers:

To handle 1 million concurrent users, we estimate needing multiple application servers behind a load balancer.
Assuming each server can handle 10,000 concurrent connections, we would need at least 100 servers.

Database Servers:

For handling 10,000 writes and 200,000 reads per second, we would need a highly scalable database solution.
A combination of sharding and replication may be required to distribute the load.

WebSocket Servers:

WebSockets are used for real-time communication.
Assuming each WebSocket server can handle 50,000 concurrent connections, we would need at least 20 servers.

Caching

In-Memory Caching:

To reduce the load on the database, frequently accessed comments should be cached.
Technologies like Redis or Memcached can be used for this purpose.

Estimated Cache Size:

Assuming 20% of comments are frequently accessed: Cache size = 20% of daily data = 0.2 * 173 GB = 34.6 GB.

API design

Post Comment

Endpoint: POST /api/posts/{post_id}/comments
Description: Posts a new comment to a specific post.

Sample request
{
  "user_id": "12345",
  "comment": "This is a comment."
}

Sample response
{
  "status": "success",
  "comment_id": "67890",
  "timestamp": "2024-07-01T12:34:56Z"
}

Get Comments

Endpoint: GET /api/posts/{post_id}/comments
Description: Retrieves all comments for a specific post.

Sample Response
{
  "status": "success",
  "comments": [
    {
      "comment_id": "67890",
      "user_id": "12345",
      "comment": "This is a comment.",
      "timestamp": "2024-07-01T12:34:56Z"
    },
    {
      "comment_id": "67891",
      "user_id": "12346",
      "comment": "This is another comment.",
      "timestamp": "2024-07-01T12:35:56Z"
    }
  ]
}

Delete Comment

Endpoint: DELETE /api/comments/{comment_id}
Description: Deletes a specific comment (admin only).

Sample Response
{
  "status": "success"
}

Real-Time Updates API

Subscribe to Comments

Endpoint: GET /api/subscribe/posts/{post_id}/comments
Description: Subscribes to real-time updates for comments on a specific post.

{
  "comment_id": "67892",
  "user_id": "12347",
  "comment": "This is a new comment.",
  "timestamp": "2024-07-01T12:36:56Z"
}

Notification API

Get Notifications

Endpoint: GET /api/users/{user_id}/notifications
Description: Retrieves notifications for a user.

{
  "status": "success",
  "notifications": [
    {
      "notification_id": "abcde",
      "message": "New comment on your post.",
      "timestamp": "2024-07-01T12:37:56Z"
    }
  ]
}

Database design

Key Points in Data Model

User Table: Stores user details including roles for authorization.
Posts Table: Contains posts with a unique identifier for each post.
Comments Table: Stores comments related to each post. The table is clustered by the timestamp to facilitate efficient retrieval of the latest comments.
Notifications Table: Manages user notifications, enabling timely updates.

High-level design

Client Application:

User-facing application (web, mobile) that allows users to interact with the system.
Communicates with the backend via REST APIs and WebSockets for real-time updates.

API Gateway:

Single entry point for all client requests.
Routes requests to appropriate services and handles authentication and authorization.

Application Servers:

Hosts the business logic and processes client requests.
Scales horizontally to handle increased load.

WebSocket Servers:

Manages real-time communication for instant comment updates.
Handles connections and broadcasts new comments to connected clients.

Database:

Stores user data, posts, comments, and notifications.
Uses a wide-column database (e.g., Apache Cassandra) for high write throughput and scalability.

Caching Layer:

In-memory cache (e.g., Redis) to store frequently accessed data and reduce database load.
Speeds up read operations and improves overall performance.

Load Balancer:

Distributes incoming traffic across multiple application and WebSocket servers to ensure even load distribution and high availability.

Notification Service:

Sends notifications to users about new comments or other events.
Can be implemented using a message queue (e.g., RabbitMQ) to decouple notification processing from the main application logic.

Connections:

Client Application to API Gateway: Handles all REST API requests for standard operations.
Client Application to WebSocket Servers: Manages real-time updates.
API Gateway to Application Servers: Routes REST API requests for processing.
Application Servers to Database: Stores and retrieves data.
Application Servers to Caching Layer: Uses caching for frequently accessed data.
Application Servers to Notification Service: Sends notifications about events.
WebSocket Servers to Application Servers: Coordinates real-time updates.
WebSocket Servers to Caching Layer: Retrieves cached data for fast access.
Caching Layer to Database: Syncs cached data with the database.
Notification Service to Database: Retrieves data for notifications.

Request flows

Posting a Comment

When a user posts a comment, the request follows this flow:

The client application sends a POST request to the API Gateway.
The API Gateway authenticates the request and forwards it to the appropriate application server.
The application server processes the request, stores the comment in the database, and updates the cache.
The application server notifies the WebSocket server about the new comment.
The WebSocket server broadcasts the new comment to all connected clients viewing the same post.
The application server sends a response back to the client application confirming the comment has been posted.

Viewing Comments

When a user views comments for a specific post, the request follows this flow:

The client application sends a GET request to the API Gateway.
The API Gateway authenticates the request and forwards it to the appropriate application server.
The application server checks the cache for the comments.
If the comments are not in the cache, the application server retrieves them from the database and updates the cache.
The application server sends the comments back to the client application.

Real-Time Comment Updates

For real-time updates, the flow follows these steps:

The client application establishes a WebSocket connection with the WebSocket server.
When a new comment is posted, the WebSocket server receives a notification from the application server.
The WebSocket server broadcasts the new comment to all connected clients viewing the same post.

Detailed component design

WebSocket Server

The WebSocket server is responsible for managing real-time communication between the clients and the backend. It ensures that any new comment posted on a post/feed is instantly broadcast to all connected clients viewing that post/feed.

Functionality

Connection Management:

Establishes and maintains WebSocket connections with clients.
Handles connection lifecycle events such as opening, closing, and errors.

Message Broadcasting:

Listens for new comment notifications from the application server.
Broadcasts new comments to all connected clients subscribed to the relevant post/feed.

Subscription Management:

Manages subscriptions for clients to specific posts/feeds.
Ensures clients receive updates only for posts/feeds they are interested in.

Scalability

The WebSocket server scales horizontally by adding more servers behind a load balancer. Each server can handle a finite number of concurrent connections (e.g., 50,000 connections per server). To scale to millions of concurrent users, multiple WebSocket servers are deployed.

Algorithms and Data Structures

Connection Pool:

A pool of active WebSocket connections is maintained using a hash map, where the key is the client ID and the value is the connection object.

Subscription List:

Each post/feed maintains a list of subscribed clients using a hash map, where the key is the post ID and the value is a list of client IDs.

Broadcasting Algorithm:

When a new comment is posted, the server retrieves the list of subscribed clients and iterates over the list to send the comment to each client.

Caching Layer

The caching layer is designed to improve read performance and reduce the load on the database by storing frequently accessed data in memory. Technologies like Redis or Memcached are commonly used.

Functionality

Data Storage:

Stores frequently accessed data such as recent comments and user sessions in memory.
Provides fast read access to cached data.

Cache Invalidation:

Implements cache invalidation strategies (e.g., time-to-live (TTL), least recently used (LRU)) to ensure data freshness and efficient use of memory.

Data Synchronization:

Syncs with the database to ensure data consistency.
Writes updates to the database and invalidates cache entries as needed.

Scalability

The caching layer scales horizontally by adding more cache nodes. Each node can handle a portion of the cache data, and a distributed cache system can be implemented using consistent hashing to distribute data across multiple nodes.

Algorithms and Data Structures

Hash Map:

Uses hash maps for efficient key-value storage and retrieval.

TTL and LRU:

Implements TTL for automatic expiration of cache entries.
Uses LRU for evicting the least recently used entries when the cache is full.

Database

The database stores all persistent data, including users, posts, comments, and notifications. For this system, a wide-column database like Apache Cassandra is used to ensure high write throughput and scalability.

Functionality

Data Storage:

Stores user information, posts, comments, and notifications.
Ensures data consistency and availability.

Query Execution:

Executes read and write queries efficiently.
Supports complex queries with secondary indexes and materialized views.

Data Replication and Sharding:

Implements replication for high availability and fault tolerance.
Uses sharding to distribute data across multiple nodes for scalability.

Scalability

The database scales horizontally by adding more nodes to the cluster. Data is distributed across nodes using consistent hashing, and replication ensures high availability.

Algorithms and Data Structures

Consistent Hashing:

Distributes data across nodes to ensure even load distribution and efficient data retrieval.

Replication Factor:

Configures replication factor to determine the number of copies of data stored across the cluster for fault tolerance.

Secondary Indexes and Materialized Views:

Uses secondary indexes for efficient querying of non-primary key attributes.
Implements materialized views to support complex queries and improve read performance.

Trade offs/Tech choices

WebSocket Server vs. Long Polling

WebSocket Server:

Pros:
Provides real-time bidirectional communication.
Reduces latency compared to long polling.
More efficient use of resources as connections remain open.
Cons:
More complex to implement and manage.
Requires handling connection lifecycle events and scaling connections.

Long Polling:

Pros:
Simpler to implement.
Works well with existing HTTP infrastructure.
Cons:
Higher latency and resource usage due to repeated polling.
Not as efficient for real-time updates.

Choice: WebSocket Server was chosen to provide real-time updates with lower latency and more efficient use of resources, despite the added complexity.

Wide-Column Database vs. Relational Database

Wide-Column Database (e.g., Apache Cassandra):

Pros:
High write throughput and horizontal scalability.
Handles large volumes of data and high velocity of write operations.
Provides eventual consistency and fault tolerance through replication.
Cons:
More complex data modeling.
Eventual consistency model might be challenging for some use cases.

Relational Database (e.g., PostgreSQL):

Pros:
Strong consistency and ACID transactions.
Familiar SQL querying capabilities.
Easier to model relational data.
Cons:
Limited horizontal scalability compared to NoSQL databases.
Potential bottlenecks with high write volumes.

Choice: A wide-column database was chosen for its ability to handle high write throughput and horizontal scalability, essential for a real-time comments system with millions of users.

Failure scenarios/bottlenecks

Scalability of WebSocket Servers:

Bottleneck: Each WebSocket server can handle a finite number of connections, creating a scalability limit.
Mitigation:
Scale horizontally by adding more WebSocket servers and balancing connections across them.
Optimize connection management and reduce the overhead per connection.

Database Write Throughput:

Bottleneck: High write loads can overwhelm the database, leading to performance degradation.
Mitigation:
Use a wide-column database designed for high write throughput.
Implement asynchronous writes and batch processing to reduce write pressure.
Scale the database horizontally with additional nodes and shards.

Cache Size and Eviction Policy:

Bottleneck: Limited cache size can lead to frequent evictions and cache misses, impacting read performance.
Mitigation:
Increase cache size based on usage patterns and capacity planning.
Use effective eviction policies (e.g., LRU, TTL) to maximize cache efficiency.
Monitor cache hit/miss ratios and adjust configurations as needed.

Notification Processing Delays:

Bottleneck: High volumes of notifications can delay processing and delivery to users.
Mitigation:
Use a message queue to decouple notification processing from the main application.
Implement worker pools to process notifications in parallel.
Scale the notification service horizontally to handle increased loads.

Future improvements

Rich Media Comments:

Support for multimedia comments (images, videos, GIFs) to enhance user interaction.
Implement content moderation tools to automatically filter inappropriate content.

Advanced Notification System:

Introduce push notifications for mobile and web clients to ensure users receive real-time updates even when they are not actively using the application.
Implement user-specific notification preferences to provide a personalized experience.

API Extensions and Integrations:

Provide extensive APIs for third-party integrations, allowing other applications to leverage the live comments system.
Implement webhooks to notify external systems about events in real-time.

Dynamic Scaling:

Implement auto-scaling for WebSocket servers, application servers, and database nodes based on real-time load metrics.
Use cloud-native solutions like Kubernetes to manage dynamic scaling and resource allocation efficiently.