Vector Clocks: Detecting Concurrent Writes Without a Global Clock
March 1, 2026
Physical clocks lie. They drift, they get yanked by NTP, they jump backwards during a leap-second correction, and virtual machines see clock steps that bare metal never does. Any ordering you build on now() is one rough day away from a wrong answer.
Lamport clocks fix part of the problem. A single counter, bumped on local events and on receives, gives you a total order that respects causality. But Lamport clocks cannot tell you that two events were concurrent. They will assign different numbers to events that happened in parallel, and you cannot recover the truth later.
Vector clocks close that gap. Each node keeps a vector of counters, one entry per writer. On a local event, the node bumps its own slot. On receive, it merges by taking the elementwise max, then bumps its own slot again. Comparing two vectors X and Y: X < Y iff every entry of X is <= the corresponding entry of Y and at least one is strictly less. If neither vector dominates the other, the events are concurrent. That is the superpower.
This matters in multi-writer eventually-consistent stores like Dynamo and Riak. When two clients write to the same key on different replicas, the system can tell whether one update supersedes the other or whether they truly collided. Concurrent writes become siblings, surfaced to the application for merging.
The production failure worth telling. A team ran a Dynamo-style key-value store for shopping cart state. A multi-minute network partition produced concurrent writes from web and mobile clients. When the partition healed, certain hot keys accumulated 30 plus siblings. Every read returned the whole list, and the application's merge function had to fold them into one cart. The merge code was written years earlier, had a quiet bug where empty list fields shadowed populated ones, and silently dropped saved items from a user's cart. Support tickets piled up before anyone correlated them.
Two fixes shipped together. First, aggressive sibling pruning: cap siblings at a small number per key and drop the oldest by causal context. Second, replace the bespoke merge with a CRDT-style structure (an OR-Set for cart items) so the merge was provably idempotent and could not drop fields under any sibling ordering.
If you use vector clocks, your problem is rarely the math. It is the siblings, and the merge function. Treat both as first class.
Vector clocks tell you whether two writes are causally ordered or genuinely concurrent. The hard part in production is not the math. It is keeping sibling versions from exploding and merging them without dropping fields.
Originally posted on LinkedIn. View original.