03 / 20
Topics / 03

Time and clocks

If two events happen on different machines, what does it mean for one to come before the other? On a single machine, you read a clock and compare. Across machines, the clocks drift, can jump backwards, and don't agree to begin with. The field has a small set of clock abstractions that handle this carefully. Each tells you something different about what "before" actually means.


Wall clocks lie, sometimes

Every server has a hardware clock. The operating system synchronises it against an NTP or PTP source. In normal operation it's accurate to a few milliseconds. That's close enough for many purposes, but nowhere near close enough for ordering.

Three things can go wrong:

  • Drift. Hardware clocks drift at typically 10-100 parts per million. A server that hasn't talked to NTP recently can be tens of seconds off.
  • Jumps. When NTP corrects a clock that's drifted, it can step it forward or backward. A timestamp written before the step and one written after can appear in the wrong order.
  • Skew. Two servers' clocks disagree. They're both drifting, both being corrected, and both wrong by different amounts. Even with NTP, expect 10-100ms of disagreement at any moment.

"This row was written at 2026-05-04T12:34:56.789" is a useful debugging breadcrumb but it's not a basis for ordering events across machines.

Lamport clocks — order without synchronisation

Leslie Lamport, 1978: forget about time-of-day and track causal order instead. Each process keeps a counter. Every event increments it. When a process sends a message, it stamps it with its current counter. When a process receives a message, it updates its counter to max(local, received) + 1.

# Process A
counter = 0
event:        counter = 1   # local event
send_msg:     counter = 2   # also tags message with 2
event:        counter = 3

# Process B (counter starts at 0)
recv_msg(2):  counter = max(0, 2) + 1 = 3
event:        counter = 4

The guarantee: if event X happened before event Y in the "happens-before" sense (Y is X's effect, directly or through some chain of messages), then Lamport(X) < Lamport(Y). The reverse isn't true. Two unrelated events can have the same or comparable counters by coincidence.

Useful for total ordering when you don't need to detect concurrency. The classic use is logical timestamps for log entries in replicated state machines.

Vector clocks — detecting concurrency

Lamport clocks tell you "X comes before Y" but not "X and Y are concurrent". Vector clocks fix that. Each process keeps a counter per process in the system. Local events increment the local entry; receiving a message merges in the sender's vector by taking the element-wise max.

# 3 processes: A, B, C. Each tracks a vector [a, b, c].
# Initially: [0, 0, 0] on each.

A: event       → A=[1, 0, 0]
A: send to B   → message tagged [1, 0, 0]
B: recv        → B = max([0,0,0], [1,0,0]) + B-bump = [1, 1, 0]
B: event       → B=[1, 2, 0]
C: event       → C=[0, 0, 1]   # concurrent with everything above

Comparing two vectors:

  • X happened before Y if every component of X is ≤ Y, and at least one is strictly less.
  • X and Y are concurrent if neither is ≤ the other.
  • Two events with the same vector are the same event.

Used in Dynamo-style stores (Riak, Cassandra in some configurations) to detect conflicts on concurrent writes. The cost: every clock value grows linearly with the number of processes, which makes them expensive in churning clusters.

Hybrid logical clocks

Vector clocks track causality precisely but don't relate to wall-clock time at all. Wall-clock timestamps relate to wall-clock time but lie about ordering. Hybrid Logical Clocks (HLC, Kulkarni et al. 2014) combine the two: timestamps that are within a bounded distance of physical time but also satisfy Lamport's happens-before.

An HLC is a pair: a physical-time component (just the wall clock) and a logical component (a small counter to disambiguate ties). Updates take the max of the local physical clock and any received timestamp's physical component, breaking ties with the logical counter. The result is an event timestamp that:

  • Stays within a few milliseconds of wall time (so you can use it for time-windowed queries).
  • Strictly respects causal order (so you can use it like a Lamport clock).
  • Is monotonic per-node and total-orderable across nodes.

This is the clock CockroachDB and YugabyteDB use to give you snapshot isolation across a cluster without TrueTime hardware.

TrueTime — bounding the uncertainty

Google's Spanner takes a different approach. Instead of inventing a logical clock, it invests in a tightly-bounded physical one. TrueTime is an API that returns not a timestamp but a time interval [earliest, latest] guaranteed to contain the actual time. The interval is typically about 6 milliseconds wide, built from GPS clocks and atomic clocks in every datacentre.

Spanner uses TrueTime to implement commit-wait: after a transaction's timestamp is chosen, the coordinator waits until now.earliest > commit_timestamp before acknowledging the commit. By the time the client sees "committed", any future transaction the same client starts will get a strictly later timestamp. This is what gives Spanner external consistency, the strongest possible ordering guarantee.

The infrastructure premium. TrueTime needs reliable atomic-clock and GPS hardware in every datacentre. Most teams don't have that. CockroachDB shows you can get most of the way there with HLCs and a different approach to commit timing.

Picking the right clock

You need toUse
Show timestamps in a UIWall clock (NTP-synchronised)
Order events on a single machineCLOCK_MONOTONIC
Order events across machines without round-tripsLamport clocks
Detect concurrent writesVector clocks
Combine ordering with rough wall timeHybrid logical clocks
External consistency without ambiguityTrueTime, if you can build it

Common mistakes

  • Using System.currentTimeMillis() to measure elapsed time. NTP can step the clock backwards, producing a negative duration. Use System.nanoTime() / CLOCK_MONOTONIC for measurement; reserve wall clock for display.
  • Last-writer-wins by wall-clock timestamp on concurrent writes. Two clients with skewed clocks will see writes "lost" depending on whose clock was ahead. If you don't have HLC or vector clocks, version vectors per row are a lighter alternative.
  • Ignoring leap seconds. A leap-second insertion adds a 23:59:60. Most systems handle it by smearing the second across an hour or by stalling, but code that assumes 86,400 seconds per day will surprise you twice a year.
  • Trusting X-Forwarded-For timestamps. Any timestamp you didn't measure yourself is hearsay. Check it against your own clock if it matters.

Further reading

Found this useful?