Split Brain and Fencing Tokens: What Happens After the Election

March 5, 2026


Split brain is the moment two nodes both believe they are the leader. It usually starts with a network partition or a long pause. The lock service stops hearing from the old leader, declares it dead, and elects a new one. So far so good. The trap is that the old leader does not know any of this happened. From its perspective the clock ticked forward by a few seconds and it is still in charge. It is also still willing to accept writes.

Raft and similar protocols prove that only one leader can be elected at a time. They do not prove that a previously elected leader will refuse to act. That is a different problem, and it lives at the storage layer.

Fencing tokens are the cleanest answer. When the lock service hands out a lease, it stamps it with a monotonically increasing version number. Every write a leader sends to storage carries that token. The storage layer remembers the highest token it has ever accepted. If a write arrives with a lower token, it is from a deposed leader, and the storage rejects it. The old leader can still try, but it cannot succeed. The brute-force alternative is STONITH, "shoot the other node in the head," where the surviving node power-cycles the suspect. STONITH works when you control the hardware. Fencing tokens work everywhere else.

The production failure I have watched twice now: a team uses etcd to elect a leader for a job scheduler. The scheduler submits jobs to a worker pool. Election is correct. What the team missed is that the workers accept job submissions on a plain HTTP endpoint with no token check. One day the leader hits a 40-second GC pause. Etcd evicts its lease, the secondary is promoted, and the secondary starts submitting jobs. Then the original leader wakes up, finishes its pause, and submits the same batch of jobs it was about to send before the pause. The workers happily accept both. Hundreds of jobs run twice. The data pipeline downstream produces duplicate aggregates and the on-call gets paged at 3am.

The fix is to push the token all the way down. Workers track the highest token they have ever seen and reject any submission carrying an older one. The deposed leader's late requests fail loudly instead of silently double-firing. The lesson generalizes: consensus on "who leads" is necessary but not sufficient. Without fencing at the system layer that actually mutates state, a long pause is indistinguishable from a coup.

Key takeaway

Consensus protocols guarantee at most one leader is elected, but they cannot stop an old leader from believing it still holds the role. Fencing tokens push the check down to the storage layer, where stale writes are rejected by version, not by trust.

Originally posted on LinkedIn. View original.


All Rights Reserved.