Codemia | Master System Design Interviews Through Active Practice

My Solution for Design a Distributed Counter

by nectar4678

System requirements

Functional:

Real-time display of the total number of users viewing the website or specific web pages via a distributed counter.
Decrementing the distributed counter value upon user exit from the website.
Providing users with the count of unread notifications upon modifications to subscribed web pages.

Non-Functional:

Highly Available
Eventually Consistent
Accurate
Reliable
Scalable
Low Latency

Capacity estimation

The system must support extremely high concurrency for write and read operations. Average number of online users 100 million and peak traffic of 300 million with anticipated read write ratio of around 10:1.

API design

The distributed counter offers Application Programming Interface (API) endpoints accessible via Representational State Transfer (REST). Real-time communication can be facilitated using either WebSockets or Server-Sent Events (SSE) as the protocol.

GET

The client sends an HTTP GET request to retrieve the distributed counter's value.

/api/i1/counter
method: GET
accept: application/json, text/html
RESPONSE:
status code: 200 OK
content-encoding: gzip
content-type: application/json
cache-control: no-store
{
  count: 2315,
  updated_at: "2024-03-25T17:57:21+00:00"
}

PUT

The client utilizes an HTTP PUT request for updating the distributed counter, chosen for its idempotent nature.

/api/i1/counter
method: PUT
content-length: 15
{
  action: "increment"
}
RESPONSE:
status code: 200 OK
content-encoding: gzip
content-type: application/json
{
  count: 2673,
  updated_at: "2024-03-25T17:57:21+00:00"
}

Database design

In the relational database, the primary entities are the users table, pages table, and counter table. The users table holds details regarding the web page authors, while the pages table stores metadata for individual web pages. The counter table maintains count statistics. Relationships include a one-to-many association between users and pages, and another one-to-many association between pages and the counter.

Data in relational databases undergo normalization to minimize redundancy and cut down on storage expenses. However, this approach can lead to increased cost in read operations due to required table JOIN operations.

To perform get and update action on SQL db, we can use the following

SELECT count AS latest_count
FROM counter
WHERE page_id='1782';
UPDATE counter
SET count=latest_count+1
WHERE page_id='42' AND count=latest_count;

In contrast, data stored in a NoSQL database like Apache Cassandra is denormalized, prioritizing faster query operations despite increased data redundancy.

Redis is an in-memory distributed key-value database. The hash data type in Redis can be used to store the count statistics for a particular web page

NoSQL vs SQL

Prioritize a database with high performance capabilities for both read and write operations.
Seek a scalable, durable, and highly available database solution.
Choose a database that supports a flexible data model to facilitate future upgrades.
Minimize learning and development costs associated with the chosen database.
Recognize the critical role of the database within the distributed counter system.
Explore potential types of data stores for constructing the distributed counter in the subsequent section.

When constructing the distributed counter, adhere to domain-driven design principles. This means selecting the best-suited technology rather than relying solely on familiarity. For instance, avoid using MapReduce for building a production-ready distributed counter solely due to prior experience. Additionally, factor in both development and operational costs during the design phase.

NoSQL Database

Apache Cassandra, an open-source distributed NoSQL database, specializes in handling time series data. Its nodes communicate via the gossip protocol and employ automatic partitioning through consistent hashing. With leaderless replication, Cassandra ensures high availability. Its storage engine, based on Log-structured Merge (LSM) trees, prioritizes efficient writes over reads.

However, frequent disk access in Cassandra can hinder read performance for the distributed counter. To mitigate this, an additional Redis cache layer can be introduced using the cache-aside pattern. While this enhances reads, it compromises data consistency. There's a risk of inaccurate counters if database writes succeed but cache updates fail.

Cassandra's counter data type offers a race-free counter with local latency across multiple data centers. Despite being theoretically sound, a case study at Ably revealed degraded performance in Cassandra's distributed counter implementation.

Issues with Cassandra's counter functionality include:

The requirement for counter columns to exist in a separate table.
Inability to delete tables containing at least one counter column.
Lack of idempotency in counter update operations, which poses a significant problem.
Additionally, Cassandra's operational complexity is notably high.

In conclusion, it is advised not to utilize Cassandra for constructing a scalable distributed counter.

Message Queue

The database undergoes updates with each count update event, following the push model approach. This method, while straightforward to implement and providing real-time performance, may experience degradation under exceedingly high write volumes.

The count update events emitted by the client can be buffered using a message queue, which then asynchronously updates the database at regular intervals. This method, referred to as the pull model, involves replicating and checkpointing the message queue to enhance high availability and fault tolerance. A serverless function periodically queries the message queue and updates the database accordingly. Additionally, a dead-letter queue is introduced to enhance fault tolerance by persisting messages when the database server is inaccessible.

Using a Distributed Key-value store

Due to its in-memory data types, Redis delivers exceptional performance and throughput. Given that the distributed counter requires extensive writing operations, Redis emerges as a promising candidate for its implementation.

The INCR command in Redis enables atomic incrementation of counters, making Redis a compelling choice for building distributed counters due to its high performance and throughput capabilities. Employing a leader-follower replication topology enhances Redis's performance, where a Redis proxy serves as a sharding proxy to route reads to follower instances and writes to the leader instance. The hot standby configuration with replicas further bolsters fault tolerance. Additionally, implementing a write-behind cache pattern can asynchronously persist counters in relational databases, ensuring improved durability.

Alternatively, an active-active Redis deployment offers low latency and high availability. However, to prevent overhead from aggregating counts across shards during read operations, shards can communicate via the gossip protocol. Yet, this approach may result in increased bandwidth usage and storage requirements. Idempotent count update operations are crucial for conflict-free count synchronization across multiple data centers. Despite Redis's native data types lacking idempotence, executing Lua scripts on the Redis server or storing user IDs in Redis sets can serve as workarounds for network failures, ultimately ensuring counter accuracy.

Bottlenecks

Redis Cluster Failure:

If the entire Redis cluster goes down due to hardware failure, network issues, or software bugs, the distributed counter system may become unavailable. This can impact real-time updates and lead to inconsistencies in the count data.

Database Synchronization Delays

Asynchronous replication from the Redis leader to followers may introduce delays in data synchronization. If the serverless function reads from a follower that has not yet received the latest updates, it may provide outdated count data to the database.