Multi-Leader Replication: The Conflict Problem Nobody Wants to Own
March 25, 2026
Multi-leader replication sells itself in one line: put a leader in each region, accept writes locally, give every user single-digit millisecond writes. The catch arrives later, usually in production.
When two regions accept different updates to the same row at nearly the same time, both writes succeed locally. The conflict only shows up when their replication streams cross. Now you have a decision the database cannot make for you: which write wins, and what does "wins" even mean for this data type.
There are three families of answers.
Last-writer-wins is the default in many engines because it is trivial to implement. Tag each write with a timestamp, keep the larger one. It is also the easiest way to silently lose data. Clock skew between regions, retries that backdate the original timestamp, and concurrent edits to different fields all collapse into the same opaque outcome.
Application-level merge is honest work. The app reads both versions, understands the schema, and produces a merged result. Good for things like shopping carts where adding items from two devices should union, not overwrite.
CRDTs encode the merge rule in the data type itself. Counters, sets, and registers with built-in commutativity. Powerful when your data fits, useless when it does not. You cannot CRDT your way out of "did the user accept the terms of service" because the answer is one row and the merge has a moral dimension.
The production failure that made me skeptical: a flash sale with regional inventory. Three regions, each holding a decrement-counter CRDT for the same SKU. The CRDT model said both decrements should commute and sum cleanly. They did. But the listener that turned the counter into order rows did not deduplicate across regions, and two decrements that came from the same user retry got merged as two distinct sales. The CRDT was correct. The downstream interpretation was not. Two customers received confirmation emails for the last unit in stock. Support ate the cost.
Multi-leader is worth its complexity when low-latency cross-region writes genuinely matter, your data shape supports merge semantics you can defend, and you have written the runbook for the day conflict resolution is wrong. If any of those is missing, single-leader with regional read replicas is still the boring, correct answer. Most flash sales do not need three writers. They need one writer and a queue.
One more piece of advice. Whatever resolution you pick, log every detected conflict to a durable stream with both versions and the chosen winner. You will not get the merge rule right the first time, and the only way to find that out is to replay real conflicts after the fact. Teams that skip this step learn about their bad rules from customers, not dashboards.
Multi-leader is not faster leader-follower. It buys local write latency in exchange for conflict resolution you have to design, test, and own.
Originally posted on LinkedIn. View original.