Design A Nested Comments System - System Design

My Solution for Design a Nested Comments System with Score: 9/10

by serenade_xenon816

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

A registered user can create a thread
User can comment on others thread
Any (authenticated/anonymous) user can view threads and comments of the that thread
A user can edit/delete own comment within a time window of 15mins

System Behaviour

Each comment is a parent comment
Reply to a comment forms a nested comment
System organises comments and keeping it in a hierarchical commenting system and displays in reverse chronological order. The comments are well-organised under each comment maintaining a parent-child relationship
A comment can have max 5 nested comment.
Pagination - comments are lazy loaded, stepping through 100 comments each
Users must be authenticated to comment or create a thread

Non-Functional:

List non-functional requirements for the system...

Threads and its comments are rendered within 200ms
System scales to 100 thousands of threads and its comments
Data consistency - the comments should appear in reverse chronological order. the comments might appear with a delay but this eventual consistency is acceptable
Security - only authenticated users can post comments. Comments are moderated to avoid false language, sanitised to prevent XSS vulnerabilities

CAP

System prioritises availability more than consistency. The comments may appear a bit later but system is available for reads and commenting

Capacity estimation

Estimate the scale of the system you are going to design...

Total Users = 100M
Daily Active Users 10% of Total users = 10M
Comments from each user = 15 comments a day = 10M x 15 = 150M comments
Spike on viral topic = 20% of 150M = 180M rounded to 200M
Each comment is of max 250 chars with additional metadata of 100 chars for threadId, authorId, commentId = 200M x 350 = 70,000, 000, 000 = 7B chars = 70GB per day
WPS = 200M/100000 secs per day = 2000 RPS with 700 Bytes payload, with a spike of viral topics 50% = 3000WPS
Storage requirements - write only in database
1. 70GB per day = 70x30 =2TB per month = 2TB x 12 = 24TB per year rounded to 25TB
2. In a distributed system writes are replicated to 3 nodes making it 70GB x 3 = 210GB per day
RPS
1. Read is greater than writes, average 10 users reads the comments = 200M x 10 = 2B reads/day = 20K RPS average with spike of 50% = 30K RPS
2. . approximately 10x to 100x more than WPS = 30K RPS to 300K RPS

API design

Define what APIs are expected from the system...

Get Threads

GET /1/threads?threadId={id}
Response : 200 OK
Response Body : [one or more Thread instances]

List All Threads

Get a paginated list of threads with parameters,

cursor: optional, a timestamp, returns threads after this timestamp
limit: optional, a number of threads to return

GET /1/threads?cursor={timestamp}&limit=50
Response : 200 OK
Response Body : {
  threads:[],
  nextCursor: <timestamp>
}

Create a Thread

POST /1/threads
Request Header: {
  Authorization: Bearer <JWT token>
}
Request body: { 
 A thread instance
}
Response: 201 CREATED

Get comments for a thread

get top-level comments of a thread, if cursor is specified then system returns comments after the specified timestamp and steps through 100 comments at a time. If cursor is not specified then first 100 comments with its sub comments

GET /1/threads/{threadId}/comments/cursor={timestamp}&limit=100
Response : 200 OK
Response Body: {one or more Comment instance, each comment instance might contain sub comments in its children attribute}

Get nested comments of a comment

GET /1/comments/{commentId}/replies?cursor={timestamp}&limit=20
Response : 200 OK
Response Body: {one or more Comment instance, each comment instance might contain sub comments in its children attribute}

Edit a Comment

PUT /1/comments/{commentid}
Request Header: {
  Authorization: Bearer <JWT token>
}
Request Body: {content of the comment}
Response : 200 OK
Response body: {updated Comment instance}

Delete a Comment

PUT /1/comments/{commentid}
Request Header: {
  Authorization: Bearer <JWT token>
}
Response : 204 NO CONTENT
Response body: {updated Comment instance}

The cursor query parameter may be ISO timestamp or UUID of a comment

Database design

Defining the system data model early on will clarify how data will flow among different components of the system. Also you could draw an ER diagram using the diagramming tool to enhance your design...

Relationship Overview

A thread can have many comments
A comment can have many nested comments
A user can create many threads and post many comments

Core Entities

Thread {threadId, title, content, createdOn, updatedOn, authorId, totalComments, isDeleted, lastCommentedOn}
Comment {commentId, threadId, parentCommentId, content, createdOn, updatedOn, isDeleted, depth, lastCommentedOn, upVotes, sortOrder, replyCount, threadIdPath, children}
Author {userId, name, handle, .....}

Fault Tolerance and Redundancy

System deployed multiple database nodes with replication factor = 3, meaning every write is copied to 2 more nodes, making it 3 copies. When a node goes down, the one or the replica can serve the data. Writes are eventual consistent and synchronised in the background. Helps in,

Disaster recovery
redundancy
high availability

Sharding

Threads are sharded by threadId making it distributed to a database node. "Write Node = hash(threadId) % N nodes" . this evenly distributes data to N nodes

Partitioning

In a node, the data is partitioned by threadId to enable parallel read/writes. Threads and related comments resides in same partition optimising access patterns.

Replication

When a thread/comment is written it's replicated to two more database nodes for redundancy, disaster recovery and high availability .

Hot Spot mitigation and caching

The reads are relatively higher than writes, hence the hot threads are cached. When a thread goes viral, the thread and its comments are cached into in-memory cache memcache distributed to multiple nodes. The data is sharded with threadId. The cache uses LRU eviction strategy and cache is updated asynchronously via a message bus.

High-level design

You should identify enough components that are needed to solve the actual problem from end to end. Also remember to draw a block diagram using the diagramming tool to augment your design. If you are unfamiliar with the tool, you can simply describe your design to the chat bot and ask it to generate a starter diagram for you to modify...

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Create a thread

User { create thread } > Load Balancer - L7 > API Gateway > ThreadService > Database > CDC: Stream >  CDC Processor > Kafka > Flink {verify/filter/Top K adjustments} > Cache update > Notification Service > Notify user

Read top threads

User {Read threads} > Load Balancer - L7 > API Gateway > ListignService > Top 30 threads from cache  

Detailed component design

Dig deeper into 2-3 components and explain in detail how they work. For example, how well does each component scale? Any relevant algorithm or data structure you like to use for a component? Also you could draw a diagram using the diagramming tool to enhance your design...

ListingService : Reads top threads and top level comments from Memcache, when user clicks on "read more" on nested comments, the service reads from database and populates to in-memory cache. Further search interactions from the user goes through this service to the database.
NoitificationService: A microservice that is invoked by Flink to send the aggregated daily updates and notify via email or SSE to mobile or websocket to browser
Authoriser : Verifies whether write requests are authenticated and reject. if not
Rate Limiter : Graceful degradation on very high load. The requests are denied with retry time period for clients to retry
Content Filter/Sanitizer : A first level defence on content filtering for profanity or harmful, the extensive filtering is done later in the background via stream processing. The stream processing analyse multiple factors including profanity, sentiment analysis, racism, pornography etc and alert admin on those. Such comments are temporarily disabled and wont show up in the UI.
AdminService : A service that lists the alerts produced by Flink, these alerts are for sensitive comments, threads that requires extra attention, the admin can deny/approve the content. if denied its marked as harmful in database and removed from cache if any.

Concurrent Writes

When multiple users posts comment on a thread at the same time, these requests might hit different nodes of ThreadService/CommentService in Kunernetes, each of these contents are updated with an unique identifier (threadId or commentId) which is a twitter-snowflake id, this helps in arranging by time when the system receives the comment/thread. This avoid collisions in database, even if two comments has the same created/updated timestamp set by two different nodes and yet the unique identifier is sortable

Spike and Handling Scale

When a thread goes viral, the reads/writes will spike for the thread this will hit the microservices in the Kubernetes pods, the auto-scaling of Kubernetes is utilised for upscaling the no of nodes and auto-balance the traffic. Each node knows how to generate unique identifiers by itself.

decluttering database and data retention

The retention period for a thread is 90 days, post 90 days the threads and its comments are moved to a hot S3 storage and written in Parquet format. This data can be queried for analytics or by user(search old thread), using Apache Trino + SQL. Further the data in S3 are moved to warm > archive with 1yr retention period

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Tech Choices

Database : AWS managed Dynamo DB for fast range queries, sharding and partitioning.
Stream Processing : Flink, Aggregates threads/comments details in 1 min to 5 min window to notify user based on the users topic preferences, also finds the top 30 threads by using Top K popular threads and update cache on each comment update of any of these 30 threads
Broker: Kafka to facilitate changed/added threads/comments and let it flow into flink and updates in-memory cache
In-memory cache: Memcache, distributed caching, caches threads and comments, this is a simple key/value store which we use to populate top threads and its contents "threadId: Json" compact, for each top threadId populate top 10 comments "threadId#<id>#comments: json" which contains for each comment first 5 nested comment.
Microservice : all microservices in Kubernetes stateless mode

Trade Offs

Eventual consistency: The new/updated threads are visible with a bit of delay, however this ensures the data is consistent.
Memcache : this is cache only and wont have writes which are persisted onto database. When the node goes down the cached data are lost, however with clustering and data distribution using consistent hashing. the data is read from another node. Memcache is fast for high read throughput

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Hotspots : When a thread goes viral, the partition in which it resides receives a lot of read requests, however caching helps to reduce the load on database servers. Even cache server might experience the load since the sharding is with threadId
Rate Limitter : Some reads might fail or delayed if the system is under heavy load, this is due to the rate limiter, however it helps the system availability and counter spam requests

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Moderate thread and comment contents - let admin hide comments
Sentiment analysis - if the thread or comments are harmful for the public or invoke racism, deny those threads
ML Analytics - understand which threads are likely to be viral and pre-cache those in memory.

Markdown supported

My Solution for Design a Nested Comments System with Score: 9/10

by serenade_xenon816

System requirements

Functional:

List functional requirements for the system (Ask the chat bot for hints if stuck.)...

A registered user can create a thread
User can comment on others thread
Any (authenticated/anonymous) user can view threads and comments of the that thread
A user can edit/delete own comment within a time window of 15mins

System Behaviour

Each comment is a parent comment
Reply to a comment forms a nested comment
System organises comments and keeping it in a hierarchical commenting system and displays in reverse chronological order. The comments are well-organised under each comment maintaining a parent-child relationship
A comment can have max 5 nested comment.
Pagination - comments are lazy loaded, stepping through 100 comments each
Users must be authenticated to comment or create a thread

Non-Functional:

List non-functional requirements for the system...

Threads and its comments are rendered within 200ms
System scales to 100 thousands of threads and its comments
Data consistency - the comments should appear in reverse chronological order. the comments might appear with a delay but this eventual consistency is acceptable
Security - only authenticated users can post comments. Comments are moderated to avoid false language, sanitised to prevent XSS vulnerabilities

CAP

System prioritises availability more than consistency. The comments may appear a bit later but system is available for reads and commenting

Capacity estimation

Estimate the scale of the system you are going to design...

Total Users = 100M
Daily Active Users 10% of Total users = 10M
Comments from each user = 15 comments a day = 10M x 15 = 150M comments
Spike on viral topic = 20% of 150M = 180M rounded to 200M
Each comment is of max 250 chars with additional metadata of 100 chars for threadId, authorId, commentId = 200M x 350 = 70,000, 000, 000 = 7B chars = 70GB per day
WPS = 200M/100000 secs per day = 2000 RPS with 700 Bytes payload, with a spike of viral topics 50% = 3000WPS
Storage requirements - write only in database
1. 70GB per day = 70x30 =2TB per month = 2TB x 12 = 24TB per year rounded to 25TB
2. In a distributed system writes are replicated to 3 nodes making it 70GB x 3 = 210GB per day
RPS
1. Read is greater than writes, average 10 users reads the comments = 200M x 10 = 2B reads/day = 20K RPS average with spike of 50% = 30K RPS
2. . approximately 10x to 100x more than WPS = 30K RPS to 300K RPS

API design

Define what APIs are expected from the system...

Get Threads

GET /1/threads?threadId={id}
Response : 200 OK
Response Body : [one or more Thread instances]

List All Threads

Get a paginated list of threads with parameters,

cursor: optional, a timestamp, returns threads after this timestamp
limit: optional, a number of threads to return

GET /1/threads?cursor={timestamp}&limit=50
Response : 200 OK
Response Body : {
  threads:[],
  nextCursor: <timestamp>
}

Create a Thread

POST /1/threads
Request Header: {
  Authorization: Bearer <JWT token>
}
Request body: { 
 A thread instance
}
Response: 201 CREATED

Get comments for a thread

GET /1/threads/{threadId}/comments/cursor={timestamp}&limit=100
Response : 200 OK
Response Body: {one or more Comment instance, each comment instance might contain sub comments in its children attribute}

Get nested comments of a comment

GET /1/comments/{commentId}/replies?cursor={timestamp}&limit=20
Response : 200 OK
Response Body: {one or more Comment instance, each comment instance might contain sub comments in its children attribute}

Edit a Comment

PUT /1/comments/{commentid}
Request Header: {
  Authorization: Bearer <JWT token>
}
Request Body: {content of the comment}
Response : 200 OK
Response body: {updated Comment instance}

Delete a Comment

PUT /1/comments/{commentid}
Request Header: {
  Authorization: Bearer <JWT token>
}
Response : 204 NO CONTENT
Response body: {updated Comment instance}

The cursor query parameter may be ISO timestamp or UUID of a comment

Database design

Relationship Overview

A thread can have many comments
A comment can have many nested comments
A user can create many threads and post many comments

Core Entities

Thread {threadId, title, content, createdOn, updatedOn, authorId, totalComments, isDeleted, lastCommentedOn}
Comment {commentId, threadId, parentCommentId, content, createdOn, updatedOn, isDeleted, depth, lastCommentedOn, upVotes, sortOrder, replyCount, threadIdPath, children}
Author {userId, name, handle, .....}

Fault Tolerance and Redundancy

Disaster recovery
redundancy
high availability

Sharding

Threads are sharded by threadId making it distributed to a database node. "Write Node = hash(threadId) % N nodes" . this evenly distributes data to N nodes

Partitioning

In a node, the data is partitioned by threadId to enable parallel read/writes. Threads and related comments resides in same partition optimising access patterns.

Replication

When a thread/comment is written it's replicated to two more database nodes for redundancy, disaster recovery and high availability .

Hot Spot mitigation and caching

High-level design

Request flows

Explain how the request flows from end to end in your high level design. Also you could draw a sequence diagram using the diagramming tool to enhance your explanation...

Create a thread

User { create thread } > Load Balancer - L7 > API Gateway > ThreadService > Database > CDC: Stream >  CDC Processor > Kafka > Flink {verify/filter/Top K adjustments} > Cache update > Notification Service > Notify user

Read top threads

User {Read threads} > Load Balancer - L7 > API Gateway > ListignService > Top 30 threads from cache  

Detailed component design

ListingService : Reads top threads and top level comments from Memcache, when user clicks on "read more" on nested comments, the service reads from database and populates to in-memory cache. Further search interactions from the user goes through this service to the database.
NoitificationService: A microservice that is invoked by Flink to send the aggregated daily updates and notify via email or SSE to mobile or websocket to browser
Authoriser : Verifies whether write requests are authenticated and reject. if not
Rate Limiter : Graceful degradation on very high load. The requests are denied with retry time period for clients to retry
Content Filter/Sanitizer : A first level defence on content filtering for profanity or harmful, the extensive filtering is done later in the background via stream processing. The stream processing analyse multiple factors including profanity, sentiment analysis, racism, pornography etc and alert admin on those. Such comments are temporarily disabled and wont show up in the UI.
AdminService : A service that lists the alerts produced by Flink, these alerts are for sensitive comments, threads that requires extra attention, the admin can deny/approve the content. if denied its marked as harmful in database and removed from cache if any.

Concurrent Writes

Spike and Handling Scale

decluttering database and data retention

Trade offs/Tech choices

Explain any trade offs you have made and why you made certain tech choices...

Tech Choices

Database : AWS managed Dynamo DB for fast range queries, sharding and partitioning.
Stream Processing : Flink, Aggregates threads/comments details in 1 min to 5 min window to notify user based on the users topic preferences, also finds the top 30 threads by using Top K popular threads and update cache on each comment update of any of these 30 threads
Broker: Kafka to facilitate changed/added threads/comments and let it flow into flink and updates in-memory cache
In-memory cache: Memcache, distributed caching, caches threads and comments, this is a simple key/value store which we use to populate top threads and its contents "threadId: Json" compact, for each top threadId populate top 10 comments "threadId#<id>#comments: json" which contains for each comment first 5 nested comment.
Microservice : all microservices in Kubernetes stateless mode

Trade Offs

Eventual consistency: The new/updated threads are visible with a bit of delay, however this ensures the data is consistent.
Memcache : this is cache only and wont have writes which are persisted onto database. When the node goes down the cached data are lost, however with clustering and data distribution using consistent hashing. the data is read from another node. Memcache is fast for high read throughput

Failure scenarios/bottlenecks

Try to discuss as many failure scenarios/bottlenecks as possible.

Hotspots : When a thread goes viral, the partition in which it resides receives a lot of read requests, however caching helps to reduce the load on database servers. Even cache server might experience the load since the sharding is with threadId
Rate Limitter : Some reads might fail or delayed if the system is under heavy load, this is due to the rate limiter, however it helps the system availability and counter spam requests

Future improvements

What are some future improvements you would make? How would you mitigate the failure scenario(s) you described above?

Moderate thread and comment contents - let admin hide comments
Sentiment analysis - if the thread or comments are harmful for the public or invoke racism, deny those threads
ML Analytics - understand which threads are likely to be viral and pre-cache those in memory.

Markdown supported

My Solution for Design a Nested Comments System with Score: 9/10

by serenade_xenon816

System requirements

Functional:

System Behaviour

Non-Functional:

Capacity estimation

API design

Get Threads

List All Threads

Create a Thread

Post a comment

Get comments for a thread

Edit a Comment

Delete a Comment

Database design

Relationship Overview

Core Entities

Fault Tolerance and Redundancy

Sharding

Partitioning

Replication

Hot Spot mitigation and caching

High-level design

Request flows

Create a thread

Read top threads

Detailed component design

Concurrent Writes

Spike and Handling Scale

decluttering database and data retention

Trade offs/Tech choices

Tech Choices

Trade Offs

Failure scenarios/bottlenecks

Future improvements

My Solution for Design a Nested Comments System with Score: 9/10

by serenade_xenon816

System requirements

Functional:

System Behaviour

Non-Functional:

Capacity estimation

API design

Get Threads

List All Threads

Create a Thread

Post a comment

Get comments for a thread

Edit a Comment

Delete a Comment

Database design

Relationship Overview

Core Entities

Fault Tolerance and Redundancy

Sharding

Partitioning

Replication

Hot Spot mitigation and caching

High-level design

Request flows

Create a thread

Read top threads

Detailed component design

Concurrent Writes

Spike and Handling Scale

decluttering database and data retention

Trade offs/Tech choices

Tech Choices

Trade Offs

Failure scenarios/bottlenecks

Future improvements