Kafka Consumer Group Rebalancing: How Partitions Get Reassigned
May 7, 2026
Kafka consumer groups are the mechanism that lets multiple workers share the load of reading from a topic without ever processing the same message twice. The trick is in how partitions get assigned, and what happens when a worker dies.
Here is the lifecycle.
When a consumer joins a group, it sends a request to the group coordinator (a specific Kafka broker chosen for this group). The coordinator hands out partitions across all live members. If you have four partitions and two consumers, each consumer gets two partitions. If you have four partitions and five consumers, one consumer is idle. Partitions are the unit of parallelism, not consumers.
Once partitions are assigned, each consumer fetches from its own set independently. There is no contention. No locks. Each consumer keeps its own offset committed back to Kafka, so it can resume from the right place after a restart.
Now the interesting part: failure.
The coordinator does not poll consumers. The consumers poll the coordinator with heartbeats. If a heartbeat misses its window (controlled by session.timeout.ms), the coordinator assumes that consumer is dead and triggers a rebalance.
During a rebalance, the entire group pauses. Every consumer drops its partition assignments and reports back to the coordinator. The coordinator runs the assignment strategy again with the new membership and hands out the new assignments. Then consumers resume from the committed offsets.
The thing to understand is that a rebalance is a stop-the-world event for the group. While it runs, no messages are being processed by anyone. That is fine if rebalances are rare. It is a problem if they happen every few seconds because your consumers are doing slow work between polls and missing heartbeats.
Two common ways this goes wrong in production:
- The consumer does a slow database write inside its message handler and misses the heartbeat. The coordinator kicks it out. Rebalance. The same handler runs again on a different consumer. Repeat forever.
- The deploy pipeline rolls the consumer fleet one pod at a time. Each pod death triggers a rebalance. The group spends most of its time rebalancing instead of consuming.
Fixes for both:
- Move slow work out of the message handler. Push to a worker pool, ack the offset, return fast.
- Use cooperative rebalancing (the default since Kafka 2.4) so deploys do not stop the world.
The rebalance protocol is one of those things that looks simple in diagrams and gets you in trouble in production. The animation shows the happy path. The interesting cases live in the timeouts.
Kafka uses heartbeats to detect dead consumers and rebalances partitions on every membership change. No data is lost, but processing pauses briefly during the rebalance.
Originally posted on LinkedIn. View original.