Time and clocks
If two events happen on different machines, what does it mean for one to come before the other? On a single machine, you read a clock and compare. Across machines, the clocks drift, can jump backwards, and don't agree to begin with. The field has a small set of clock abstractions that handle this carefully. Each tells you something different about what "before" actually means.
Wall clocks lie, sometimes
Every server has a hardware clock. The operating system synchronises it against an NTP or PTP source. In normal operation it's accurate to a few milliseconds. That's close enough for many purposes, but nowhere near close enough for ordering.
Three things can go wrong:
- Drift. Hardware clocks drift at typically 10-100 parts per million. A server that hasn't talked to NTP recently can be tens of seconds off.
- Jumps. When NTP corrects a clock that's drifted, it can step it forward or backward. A timestamp written before the step and one written after can appear in the wrong order.
- Skew. Two servers' clocks disagree. They're both drifting, both being corrected, and both wrong by different amounts. Even with NTP, expect 10-100ms of disagreement at any moment.
"This row was written at 2026-05-04T12:34:56.789" is a useful debugging
breadcrumb but it's not a basis for ordering events across machines.
Lamport clocks — order without synchronisation
Leslie Lamport, 1978: forget about time-of-day and track causal order instead. Each
process keeps a counter. Every event increments it. When a process sends a
message, it stamps it with its current counter. When a process receives a message, it
updates its counter to max(local, received) + 1.
# Process A
counter = 0
event: counter = 1 # local event
send_msg: counter = 2 # also tags message with 2
event: counter = 3
# Process B (counter starts at 0)
recv_msg(2): counter = max(0, 2) + 1 = 3
event: counter = 4The guarantee: if event X happened before event Y in the "happens-before" sense (Y is X's effect, directly or through some chain of messages), then Lamport(X) < Lamport(Y). The reverse isn't true. Two unrelated events can have the same or comparable counters by coincidence.
Useful for total ordering when you don't need to detect concurrency. The classic use is logical timestamps for log entries in replicated state machines.
Vector clocks — detecting concurrency
Lamport clocks tell you "X comes before Y" but not "X and Y are concurrent". Vector clocks fix that. Each process keeps a counter per process in the system. Local events increment the local entry; receiving a message merges in the sender's vector by taking the element-wise max.
# 3 processes: A, B, C. Each tracks a vector [a, b, c].
# Initially: [0, 0, 0] on each.
A: event → A=[1, 0, 0]
A: send to B → message tagged [1, 0, 0]
B: recv → B = max([0,0,0], [1,0,0]) + B-bump = [1, 1, 0]
B: event → B=[1, 2, 0]
C: event → C=[0, 0, 1] # concurrent with everything aboveComparing two vectors:
- X happened before Y if every component of X is ≤ Y, and at least one is strictly less.
- X and Y are concurrent if neither is ≤ the other.
- Two events with the same vector are the same event.
Used in Dynamo-style stores (Riak, Cassandra in some configurations) to detect conflicts on concurrent writes. The cost: every clock value grows linearly with the number of processes, which makes them expensive in churning clusters.
Hybrid logical clocks
Vector clocks track causality precisely but don't relate to wall-clock time at all. Wall-clock timestamps relate to wall-clock time but lie about ordering. Hybrid Logical Clocks (HLC, Kulkarni et al. 2014) combine the two: timestamps that are within a bounded distance of physical time but also satisfy Lamport's happens-before.
An HLC is a pair: a physical-time component (just the wall clock) and a logical component (a small counter to disambiguate ties). Updates take the max of the local physical clock and any received timestamp's physical component, breaking ties with the logical counter. The result is an event timestamp that:
- Stays within a few milliseconds of wall time (so you can use it for time-windowed queries).
- Strictly respects causal order (so you can use it like a Lamport clock).
- Is monotonic per-node and total-orderable across nodes.
This is the clock CockroachDB and YugabyteDB use to give you snapshot isolation across a cluster without TrueTime hardware.
TrueTime — bounding the uncertainty
Google's Spanner takes a different approach. Instead of inventing a logical clock, it
invests in a tightly-bounded physical one. TrueTime is an API that returns not a
timestamp but a time interval [earliest, latest] guaranteed to
contain the actual time. The interval is typically about 6 milliseconds wide,
built from GPS clocks and atomic clocks in every datacentre.
Spanner uses TrueTime to implement commit-wait: after a transaction's
timestamp is chosen, the coordinator waits until now.earliest >
commit_timestamp before acknowledging the commit. By the time the client sees
"committed", any future transaction the same client starts will get a strictly later
timestamp. This is what gives Spanner external consistency, the strongest possible
ordering guarantee.
Picking the right clock
| You need to | Use |
|---|---|
| Show timestamps in a UI | Wall clock (NTP-synchronised) |
| Order events on a single machine | CLOCK_MONOTONIC |
| Order events across machines without round-trips | Lamport clocks |
| Detect concurrent writes | Vector clocks |
| Combine ordering with rough wall time | Hybrid logical clocks |
| External consistency without ambiguity | TrueTime, if you can build it |
Common mistakes
- Using
System.currentTimeMillis()to measure elapsed time. NTP can step the clock backwards, producing a negative duration. UseSystem.nanoTime()/CLOCK_MONOTONICfor measurement; reserve wall clock for display. - Last-writer-wins by wall-clock timestamp on concurrent writes. Two clients with skewed clocks will see writes "lost" depending on whose clock was ahead. If you don't have HLC or vector clocks, version vectors per row are a lighter alternative.
- Ignoring leap seconds. A leap-second insertion adds a 23:59:60. Most systems handle it by smearing the second across an hour or by stalling, but code that assumes 86,400 seconds per day will surprise you twice a year.
- Trusting
X-Forwarded-Fortimestamps. Any timestamp you didn't measure yourself is hearsay. Check it against your own clock if it matters.
Further reading
- Lamport (1978) — Time, Clocks, and the Ordering of Events — the paper that introduced happens-before and logical clocks.
- Fidge (1988) — Timestamps in Message-Passing Systems That Preserve the Partial Ordering — and Mattern (1989) independently — vector clocks.
- Kulkarni et al. (2014) — Logical Physical Clocks (HLC) — the design CockroachDB and YugabyteDB use.
- Corbett et al. — Spanner: Google's Globally-Distributed Database — TrueTime in §3.
- Cockroach Labs — Living without atomic clocks — how HLC stands in for TrueTime when you don't have GPS hardware.