Kafka Log Truncation and Unclean Leader Election: When Failover Rewrites History

December 30, 2025

Most people think of Kafka failover as picking a new leader and pointing producers at it. That is the easy half. The hard half is that every surviving replica must rewrite its own log to agree with the new leader. Failover is a truncation event, not a relabel.

Start with the line of truth. Every partition has a High Watermark, the highest offset that every member of the ISR has replicated. Anything at or below the HW is considered committed and is visible to consumers. Anything above the HW exists on at least the leader but has not yet been confirmed safe. Producers writing with acks=all block until their offsets cross the HW. Consumers never read beyond it.

Now the failure. The leader has offsets 0 through 10. The HW is 7. Offsets 8, 9, 10 were appended to the leader but the followers had not caught up. The leader crashes.

In a clean election the controller picks a new leader from the ISR. Every ISR member already has offsets 0 through 7 because that is what the HW guarantees. The new leader takes over at offset 7. When the followers reconnect, they truncate their own logs back to 7 to match the new leader, then start replicating fresh. The only data lost is the uncommitted tail. No consumer ever saw 8, 9, or 10, so no consumer is confused.

In an unclean election the ISR is empty. Every in-sync replica died. The only survivors are replicas that fell behind, maybe at offset 5. If unclean.leader.election.enable=true, the controller hands leadership to one of them anyway. The new leader starts producing from offset 5. Other replicas, including ones that come back later holding offsets 6 and 7, truncate forward to match the new leader. Committed data is gone. Consumers who already acknowledged offsets 6 and 7 will see different records when they fetch those offsets again on the new leader. Downstream state diverges from the log. Availability is preserved at the cost of correctness.

This is why the flag default is false in modern Kafka. The cluster going offline is a loud, paged, visible failure. Silent data loss is the failure mode that hides for weeks and corrupts every downstream snapshot in the meantime.

The rule worth keeping: the HW tells you what is safe, the ISR tells you who is trusted, and unclean.leader.election.enable is the explicit choice between losing availability and losing history. Pick deliberately. Set min.insync.replicas together with acks=all so a shrinking ISR fails writes early instead of letting you discover the gap during recovery.

Key takeaway

Failover in Kafka is not a clean cutover. Every replica truncates its log to align with the new leader. Clean election only loses the uncommitted tail. Unclean election can erase data your consumers already acknowledged.

Originally posted on LinkedIn. View original.