Kafka Scaling Math: Partitions, Consumers, and Why More Is Not Free
December 21, 2025
Kafka scaling looks like magic until you do the arithmetic. Then it looks like a budget you can blow.
Start with the per-partition ceiling. A single partition tops out around 10 to 30 MB/s on commodity hardware, depending on record size, compression, and how aggressively you batch. Total topic throughput is partitions times that ceiling, minus replication overhead. Replication factor 3 means every byte you produce gets written three times across the cluster, so a 100 MB/s ingest rate is really 300 MB/s of network and disk between brokers.
Consumer parallelism is even simpler. A consumer group can have at most as many active consumers as the topic has partitions. Adding a fourth consumer to a three-partition topic gets you one idle process. So if you need 50 concurrent workers, you need at least 50 partitions. People internalize this and then over-correct.
The trap is that partitions are not free on the broker side. Each partition is a directory of segment files: the broker opens file descriptors for the active segment, the index, the time index, and any replica it follows. The controller tracks leadership, ISR membership, and per-partition state in memory. Rebalances move partition leadership around, and the cost scales with partition count. ZooKeeper-era clusters felt this hard, and KRaft is better but not free.
The production failure I watched in person: a team running 50 partitions on their main events topic decided to "future-proof" by scaling to 5000 partitions on a single topic. They argued the keyspace would grow and they did not want to repartition later. Two things broke. Broker boot time went from 30 seconds to almost 12 minutes because the broker opens every segment file on startup to rebuild its in-memory index. And during an incident two weeks later, the controller failed over, and the new controller spent 45 minutes loading partition state and reassigning leadership. The cluster was effectively unavailable for that window. Producers timed out, consumer groups thrashed in rebalance loops, and the on-call could not tell whether the cluster was dead or just slow.
The fix took a week of careful migration: create a new topic with 200 partitions, dual-write from producers, drain consumers from the old topic, and decommission. Painful, but cheaper than another 45-minute outage. The rule of thumb that came out of it: target 4000 to 5000 partitions max per broker across all topics, not per topic. Plan partition count by required consumer parallelism with a 2x cushion, not by speculative future scale. If you genuinely outgrow that, scale by adding brokers and rebalancing, not by inflating partition counts.
Mental model: partitions are the unit of parallelism and the unit of broker bookkeeping. You pay for both.
Plan partitions by consumer parallelism, not by anticipated future scale. Brokers pay per-partition costs in file descriptors, controller state, and rebalance time, and over-provisioning shows up as a 12-minute broker boot during an incident.
Originally posted on LinkedIn. View original.