How Kafka Replicates Data and Elects a New Leader When a Broker Dies

May 6, 2026

Every partition in Kafka has exactly one leader and some number of followers. Producers and consumers only talk to the leader. The followers exist for one reason: to keep up with the leader so that one of them can take over if the leader dies.

Here is the write path.

A producer sends a record for partition 7. The leader for partition 7 appends it to its local log and assigns an offset. The followers in the In-Sync Replica set (the ISR) pull that record over and append it to their own logs. Once enough followers have confirmed the write, the leader advances the High Watermark, and only then is that offset visible to consumers.

The High Watermark is the trick that makes this safe. Consumers never see a message that has not yet been replicated to the ISR. If the leader crashes mid-write, the unreplicated tail of its log is simply discarded. No consumer ever saw it, so no consumer is confused.

The ISR is dynamic. A follower that falls behind the leader by more than replica.lag.time.max.ms gets booted from the ISR. The leader keeps advancing the High Watermark using only the remaining members. This is why "three replicas" does not automatically mean "tolerates two failures." If two of your three followers are lagging, your effective ISR is one, and one more failure costs you data.

Now the failure case.

When the leader dies, the controller (in modern Kafka, a broker running the KRaft quorum, formerly ZooKeeper) detects it through session expiration. The controller picks a new leader from the surviving ISR members. Because ISR members were caught up to the High Watermark, the new leader has every committed record. Producers reconnect, consumers reconnect, the partition keeps moving.

The thing that bites people in production is unclean.leader.election.enable. If the entire ISR dies and only an out-of-sync replica is left, the default is to refuse election and stay offline. Flip that flag to true to come back online, and you accept silent data loss: the new leader is missing everything that was in the dead ISR but not in itself.

The rule worth memorizing: durability in Kafka is a property of the ISR, not of the replication factor. Configure min.insync.replicas and producer acks=all together. Otherwise you have replicas on paper that do nothing for you when a broker actually dies.

Key takeaway

Kafka durability comes from the ISR, not from replication count. The High Watermark hides uncommitted writes, and only ISR members are eligible to become leader.

Originally posted on LinkedIn. View original.