System Design Fundamentals
Networking & APIs
Storage & Data Modeling
Partitioning, Replication & Consistency
Caching & Edge
Messaging & Streaming
Reliability & Operability
Security & Privacy
Distributed Counters and Top-K Approximations
Counting is trivial on a single machine — increment an integer. But at global scale, a single counter becomes a bottleneck that limits your entire system's throughput. Every like, page view, and API call needs to be counted, and millions of these events happen per second across servers in multiple regions.

The Single Counter Bottleneck
Consider a "like" counter on a viral social media post. With a single counter row in a database:
This creates three problems:
- Write contention — Every increment acquires a row lock. With 10,000 likes per second, transactions queue behind each other. Lock wait time dominates, and throughput plateaus regardless of how powerful the database is.
- Single point of failure — If the database node holding this row fails, no likes can be recorded until failover completes.
- Cross-region latency — Users in Tokyo incrementing a counter in US-East pay 150ms round-trip per like. At scale, this latency is unacceptable.
Sharded Counters
The fundamental solution: split one logical counter into N physical shards. Each write goes to a random shard. Reads aggregate all shards.
With 16 shards, write contention drops by 16x. Each shard receives roughly 1/16th of the writes. The trade-off: reads must fan out to all shards and sum, which is slower than reading a single value. For counters that are written frequently but read occasionally (like counts, view counts), this trade-off is excellent.
Hierarchical Aggregation
For global-scale counting (billions of events across data centers), sharded counters within one region combine with cross-region aggregation:
- Edge: Each application server buffers increments in memory for a short interval (1-5 seconds)
- Region: Buffered counts flush to a regional aggregator (Kafka + Flink, or a regional Redis)
- Global: Regional totals are periodically merged into a global store
This is eventually consistent — the global total lags behind reality by a few seconds. But for counters where exact real-time accuracy is not critical (view counts, like counts), the latency-consistency trade-off is worth the massive throughput gain.
The core insight of distributed counting is that a single counter creates a serialization point — every writer waits for every other writer. Sharding breaks this serialization by giving each writer its own counter to increment without contention. The read-time aggregation cost is the price you pay for write scalability.
Distributing counters introduces accuracy challenges that do not exist with a single counter. Every design choice trades accuracy for something else — throughput, latency, availability, or memory.
Lost Updates During Failures
When a server buffers increments in memory and crashes before flushing, those increments are lost. If 500 increments were buffered and the server dies, the count is permanently 500 lower than reality.
Mitigations:
- Shorter flush intervals — Reduce the buffer window from 5 seconds to 100ms. Limits maximum loss to 100ms of traffic. Cost: more frequent network calls.
- Write-ahead log — Append each increment to a local WAL before buffering. On recovery, replay the WAL. Cost: disk I/O per increment.
- Accept the loss — For non-critical counters (page views, impressions), losing a few hundred counts during a rare crash is acceptable. The operational simplicity of not having a WAL may be worth more than perfect accuracy.
Double Counting
The opposite problem: a server flushes its buffer, the aggregator processes it, but the acknowledgment is lost. The server retries, and the same batch is counted twice.
Mitigations:
- Idempotent writes — Assign each batch a unique ID. The aggregator deduplicates by batch ID before applying. Cost: additional state for dedup tracking.
- At-least-once with reconciliation — Accept occasional double-counting, then periodically reconcile against a source of truth (e.g., count distinct events in a data warehouse).
Stale Reads
With buffered writes and regional aggregation, the global count always lags behind reality:
For most counters, this is fine. YouTube shows "1.2M views" — nobody expects that to be exact to the unit. But for some use cases (inventory count, remaining seats), staleness causes real problems (overselling).
Consistency vs Availability
Under a network partition between regions:
- Choose consistency: Stop accepting increments in the partitioned region until connectivity is restored. The count is accurate but the service is degraded.
- Choose availability: Continue counting locally in each region. When the partition heals, merge the counts. The count may temporarily show different values in different regions.
Most counting systems choose availability because the cost of an inaccurate count (off by a few percent temporarily) is far lower than the cost of refusing to count (broken user experience, lost data).
Note that the failure modes are asymmetric: lost updates make the count too low (undercounting), while retry-related double counting makes it too high (overcounting). Most systems bias toward overcounting because undercounting loses real data permanently, while overcounting can be corrected by periodic reconciliation against a source of truth.
The previous sections assumed you need the actual count. But many counting problems do not require exact numbers — they need answers to questions like "roughly how many unique users visited today?" or "which URLs are accessed most frequently?" Approximate counting trades bounded error for dramatic reductions in memory and computation.
When Exact Counting Fails
Counting the exact number of unique visitors to a website requires storing every unique identifier you have seen:
For one counter this is manageable. For 10,000 counters (one per URL, one per campaign, one per region), you need 36 TB of memory. This is where approximate counting becomes essential.
The Accuracy-Memory Trade-off
| Approach | Memory per Counter | Accuracy | Use Case |
| Exact (HashSet) | O(n) where n = unique items | 100% exact | Financial transactions, inventory |
| HyperLogLog | ~12 KB (fixed) | Within 0.81% error | Unique visitor counts, cardinality |
| Count-Min Sketch | O(width x depth) | Overcounts by bounded amount | Frequency estimation, trending topics |
| Bloom Filter | O(n) bits | False positives only | "Have I seen this before?" |
The key insight: HyperLogLog counts 1 billion unique items in 12 KB of memory with 0.81% error. An exact HashSet would need roughly 36 GB. That is a 3-million-fold memory reduction for less than 1% error.
When to Use Exact vs Approximate
Use exact counting when:
- The count has financial implications (billing, revenue, inventory)
- Off-by-one errors cause business logic failures (rate limiting, quota enforcement)
- The dataset is small enough that exact counting is cheap
Use approximate counting when:
- The count is displayed to users (view counts, like counts, "1.2M subscribers")
- The count drives analytics or recommendations (trending topics, popular items)
- The dataset is too large for exact counting to be feasible (unique visitors across billions of events)
In system design interviews, the choice between exact and approximate counting is a key decision point. State the trade-off explicitly: 'For view counts we can use HyperLogLog at 12KB per counter with less than 1% error. For inventory counts we need exact atomic decrements because selling more items than we have causes real-world shipping failures.' This shows you understand when approximation is acceptable.
Two probabilistic data structures dominate large-scale counting: HyperLogLog for cardinality estimation ("how many unique items?") and Count-Min Sketch for frequency estimation ("how often does each item appear?"). Understanding their internals reveals why they work and when they fail.

HyperLogLog (HLL)
HyperLogLog estimates the number of distinct elements in a dataset. The core idea: hash each element to a binary string, and track the maximum number of leading zeros observed. More leading zeros imply more distinct elements (like flipping coins — the more flips, the more likely you are to see a long streak of heads).
Redis has built-in HLL support with PFADD and PFCOUNT:
HLL's killer feature: PFMERGE combines multiple HLLs into one. You can count unique visitors per hour, then merge hourly HLLs into daily, weekly, and monthly counts without double-counting users who visited multiple times.

Count-Min Sketch (CMS)
Count-Min Sketch estimates how often a specific element appears. It uses D independent hash functions, each mapping to one of W positions in an array:
The CMS never undercounts — the true count is always less than or equal to the estimate. Hash collisions cause other elements to increment the same positions, inflating the estimate. Taking the minimum across D arrays reduces this inflation because a collision in all D arrays simultaneously is unlikely.
When to Use Which
- "How many unique X?" -> HyperLogLog (cardinality)
- "How often does X appear?" -> Count-Min Sketch (frequency)
- "Has X been seen before?" -> Bloom Filter (membership)
HyperLogLog and Count-Min Sketch solve fundamentally different counting problems. HLL answers 'how many distinct items?' (cardinality) — it cannot tell you which items or how often. CMS answers 'how often does this specific item appear?' (frequency) — it cannot tell you the total number of distinct items. Confusing the two is a common interview mistake.
Finding the top-K most frequent elements in a data stream is one of the most common analytics problems. Trending topics, popular products, heavy-hitter detection for abuse prevention — all require identifying which elements appear most often without storing the entire stream.

Exact Top-K (HashMap + Heap)
The straightforward approach: count every element in a hashmap, maintain a min-heap of size K.
This works for small cardinalities (thousands of distinct elements) but fails when the number of distinct elements is large.
Space-Saving Algorithm
Space-Saving keeps only K counters and guarantees that the true top-K elements are among them:
Space-Saving uses O(K) memory regardless of stream size. The trade-off: counts for elements that replaced a previous minimum are overestimated (they inherit the evicted element's count + 1). But the true heavy hitters (elements with genuinely high frequency) are rarely evicted because their counts are always above the minimum.
Count-Min Sketch + Heap (Approximate Top-K)
Combine CMS for frequency estimation with a min-heap for tracking the current top-K:
This uses CMS memory (fixed, independent of stream size) plus K heap entries. The CMS's overcounting bias means some elements may appear more frequent than they are, potentially displacing a true top-K element. In practice, the overcounting is small relative to heavy hitter counts, so the top-K result is accurate for high-frequency elements.
Comparison
| Algorithm | Memory | Accuracy | Stream-Friendly |
| Exact (HashMap) | O(n distinct) | Exact | No (unbounded) |
| Space-Saving | O(K) | Guaranteed top-K present | Yes |
| CMS + Heap | O(CMS size + K) | Approximate (CMS overcounting) | Yes |
Every counting system sits on a spectrum from "perfectly accurate but expensive" to "approximately accurate but cheap." The right position on this spectrum depends on your business requirements, not your technical preferences.
Decision Framework
Ask three questions about each counter:
- What is the cost of being wrong? An inventory count off by 1 causes a fulfillment failure. A view count off by 1,000 is invisible. The higher the cost, the more you should invest in accuracy.
- What is the cardinality of the counted set? Counting 1,000 unique items is cheap with exact methods. Counting 1 billion unique items requires probabilistic approaches unless you have terabytes of memory.
- What is the query pattern? Write-heavy counters (billions of increments, occasional reads) benefit from sharding and buffering. Read-heavy counters (displayed on every page load) benefit from caching the aggregated value.
Architecture Patterns
Pattern 1: Sharded counter + cache (social metrics)
Best for: like counts, view counts, follower counts. Writes are cheap and distributed. Reads are cached and approximate.
Pattern 2: HLL per time window (unique counting)
Best for: DAU/MAU, unique visitors per page, unique events per campaign. Fixed memory per counter, mergeable across time windows.
Pattern 3: CMS + heap (trending/heavy hitters)
Best for: trending topics, popular products, heavy-hitter API clients. Identifies high-frequency elements without tracking all elements.
Pattern 4: Exact atomic counter (financial/inventory)
Best for: inventory, account balances, rate limit counters. No approximation. Single point of serialization accepted because correctness is non-negotiable.
Common Interview Question Patterns
Interviewers test whether you can match the counting approach to the use case:
- "Design a view counter for YouTube" -> Sharded counter + cache (write-heavy, approximate OK)
- "Track unique visitors for analytics" -> HyperLogLog (cardinality, fixed memory)
- "Find trending topics on Twitter" -> CMS + heap (frequency + top-K, streaming)
- "Track inventory for e-commerce" -> Exact atomic counter (correctness required)
Common Mistakes
Using HyperLogLog for frequency. HLL answers "how many unique?" not "how often does X appear?" Use Count-Min Sketch for frequency estimation. These two structures solve fundamentally different problems.
Using sharded counters for inventory. Sharded counters cannot enforce global constraints like "total must not go negative." Individual shards have no visibility into other shards. Use atomic counters for inventory where correctness is non-negotiable.
Ignoring cache staleness. A cached aggregate that is 5 seconds stale is fine for likes, but causes overselling for inventory. Always match your caching strategy to the accuracy requirement of the business domain.
Over-engineering non-critical counters. Page view counts do not need WAL, idempotency, or strong consistency. Accept occasional loss and keep the system simple. The engineering cost of perfect accuracy often exceeds the business value of the counter. Reserve strong consistency for counters where errors have real-world consequences.
The right question is always: "what happens if this count is wrong?" If the answer is "nothing visible to users," use the simplest approach.
In system design interviews, always state the accuracy requirement before choosing the counting approach. Say: 'View counts can tolerate eventual consistency and approximation, so I will use sharded counters with a cached aggregate. Inventory counts require strong consistency, so I will use an atomic Redis counter with no buffering.' This demonstrates that you choose tools based on requirements, not habit.