Why Clocks Lie in Distributed Systems

February 27, 2026

A new engineer's first instinct when ordering distributed events is to sort by created_at. It feels obvious. It is also wrong.

NTP is the standard story for clock sync, and on a healthy network it pulls hosts to within a few milliseconds of UTC. On a less healthy network, or after a long disconnection, NTP can step a host's clock by 100 ms or more in a single correction. Some operators run with slew to avoid steps, but slew has its own cost: time speeds up or slows down to converge, and any monotonic counter you derived from wall time inherits that anomaly.

Then there is drift. Two machines with cheap crystals can disagree by tens of microseconds per second between corrections. In a virtual machine the host's scheduler can pause a guest for hundreds of milliseconds, and when the guest resumes it sees time leap forward. Leap seconds historically caused entire datacenter outages because kernel timer code was not written to handle a 23:59:60 second.

The deeper problem is that wall-clock time across machines is not a total order. If host A timestamps an event at T1 and host B timestamps an event at T2 with T2 > T1, that does not prove A's event happened before B's. The clocks could disagree by more than the difference. Without an explicit bound on uncertainty you cannot make that call.

Google's Spanner takes this seriously. TrueTime returns an interval [earliest, latest] rather than a point, with a bounded epsilon delivered by GPS and atomic clocks per datacenter. To commit a transaction, Spanner waits out the uncertainty so any later read is guaranteed to see it. The wait is small, but it is real, and it is the price of safe wall-clock ordering.

The production failure worth remembering. A financial system used now() to timestamp incoming orders and wrote them to an append log keyed by (symbol, timestamp). During a leap-second correction one of the ingest hosts rewound 80 ms. A market-data burst arrived in those 80 ms with timestamps that landed before earlier orders already in the log. Downstream replay logic deduped on (symbol, timestamp) and treated the older-looking new orders as duplicates of the actually-earlier orders. Roughly 1200 orders silently vanished. The post-mortem rewrote ingest to use a Hybrid Logical Clock with a monotonic component, so a clock rewind could never produce a smaller timestamp than one already written.

If you order events by wall time, you are not ordering events. You are guessing.

Key takeaway

Wall-clock time across machines is not a total order. Without bounded uncertainty you cannot safely sort events by timestamp, and any system that pretends otherwise is one NTP correction away from data loss.

Originally posted on LinkedIn. View original.