04 / 10
Internals / 04

The Go memory model

What does it mean for two goroutines to see the same value at the same time? The Go memory model answers that question by defining a happens-before relation: a partial order over memory operations that the runtime, compiler, and hardware must collectively respect. Get the edges right and concurrent code is portable across architectures; get them wrong and it works on x86, breaks on ARM, and is undefined either way.


Why the memory model matters

Concurrent programs ask one awkward question over and over: when goroutine A writes to a variable and goroutine B reads it, what does B see? Without a memory model, the answer depends on the CPU, the compiler, the optimiser, and the phase of the moon. With one, there's a contract: under these conditions, B sees A's write; outside them, the program is undefined.

The Go memory model has had two main eras. The 2009 spec was deliberately weak — it told you about channels and mutexes but said little about racy programs beyond "don't". The 2022 revision aligned Go with the C/C++ and Java approach: sequentially consistent atomics by default, a more precise happens-before relation, and an explicit statement that a data race is undefined behaviour rather than something with a "best effort" outcome.

The rule of thumb. If you can't articulate the happens-before edge that connects one goroutine's write to another goroutine's read, you have a race. The race detector will usually find it; the memory model says the program's behaviour is undefined until you fix it.

The happens-before relation

Happens-before is a partial order over memory operations. If event a happens-before event b, then b is guaranteed to observe the effects of a. If neither a happens-before b nor b happens-before a, and at least one of them is a write to the same location, you have a data race.

There are four sources of happens-before edges in Go:

  • Program order within a goroutine. Statements that appear earlier in the source happen-before later ones, modulo the compiler's freedom to reorder anything that doesn't cross a synchronisation point.
  • Channel operations. A send on a channel happens-before the corresponding receive completes. For unbuffered channels, the receive also happens-before the send returns.
  • Mutex operations. For any sync.Mutex, the n-th Unlock happens-before the (n+1)-th Lock returns.
  • Atomic operations. Since Go 1.19, atomics in sync/atomic are sequentially consistent — they participate in a single total order that all goroutines agree on.
// Classic race. No happens-before edge between G1's write and G2's read.
var x int
var done bool

go func() {           // G1
    x = 42
    done = true       // racy write
}()

for !done {           // G2: racy read in a hot loop
}
fmt.Println(x)        // may print 0, may print 42, may never get here

The spec correctly calls this undefined. In practice, on x86 the compiler may hoist done into a register and loop forever; on ARM the read of x may be reordered before the read of done and print zero. Both are valid under the memory model because the program has no synchronisation.

Synchronisation primitives

The standard library gives you a small set of tools that each carry well-defined happens-before guarantees. Use them and you don't have to reason about the underlying hardware.

  • sync.Mutex and sync.RWMutex. Unlock acts as a release; the next Lock on the same mutex acts as an acquire. Everything written before the unlock is visible to anyone who observes the next lock.
  • sync.WaitGroup. A call to Done happens-before any Wait that returns because of it. The pattern of Add → goroutine → DoneWait gives you a clean fence.
  • sync.Once. The first call to Do(f) runs f to completion before any other call to Do returns. This makes sync.Once safe for lazy initialisation without an explicit mutex.
  • Channels. The richest source of happens-before edges in Go. Covered in detail on the channels page.
Why this matters. A correct concurrent program is one where every shared read can be traced back through a happens-before edge to the write it observes. The primitives above are how you build those edges — they are not just locks, they are memory fences with a sociable API.

sync/atomic

sync/atomic exposes hardware atomic operations as ordinary function calls. Since Go 1.19 the package also offers typed wrappers (atomic.Int64, atomic.Pointer[T], etc.) that are easier to use correctly than the older pointer-taking functions. All operations are sequentially consistent by default — the same semantics as a C++ std::atomic with the default memory_order_seq_cst.

OperationMemory semantics
LoadSequentially consistent acquire
StoreSequentially consistent release
CompareAndSwapFull barrier, both sides
Add / SwapFull barrier, both sides
atomic.ValueCAS-free reads of arbitrary types via a typed pointer swap

Sequentially consistent means there is a single global order of atomic operations that every goroutine agrees on. It's the easiest model to reason about and usually the right default. The cost is a memory fence on weakly-ordered architectures (ARM, POWER); on x86 most atomics are nearly free because the hardware is already strongly ordered.

What sync/atomic doesn't give you

Atomic operations are atomic, not magical. A few things they explicitly do not provide:

  • Atomicity across multiple operations. Reading an atomic.Int64, deciding what to do, and writing it back is three operations. Another goroutine can interleave between any two of them. CompareAndSwap in a loop is the usual fix.
  • Protection against TOCTOU. "Check this atomic flag, then do something based on it" has the same time-of-check / time-of-use hazard as any other check. The flag can change between the load and your next instruction.
  • A replacement for higher-level synchronisation. If you find yourself building a lock out of atomics, you almost certainly want sync.Mutex. The standard library implementations are already optimised, already correct under the memory model, and already understood by reviewers.

A good test: if you cannot describe in one sentence what the atomic is synchronising between which two goroutines, reach for a mutex.

The race detector

Go ships with a race detector built on Google's ThreadSanitizer. It instruments every memory access to record a vector clock per goroutine, then flags any pair of accesses to the same location where at least one is a write and neither happens-before the other.

You enable it with -race on any of the usual subcommands: go test -race, go run -race, go build -race. There is no separate build mode to remember — the same flag works everywhere.

$ go test -race ./...
==================
WARNING: DATA RACE
Write at 0x00c00001a0e0 by goroutine 7:
    main.worker()
        /src/main.go:14 +0x44

Previous read at 0x00c00001a0e0 by goroutine 6:
    main.main()
        /src/main.go:21 +0x88
==================

Runtime cost is roughly 5–10× slower and 2× more memory, so it's a development and CI tool, not a production one. Two important caveats: the race detector finds races on code paths it actually exercises, so coverage matters; and "no race detected" does not mean "correct", it means "no race observed on these inputs in this run".

The 2022 revision

In 2022 Russ Cox led a revision of the memory model — see research.swtch.com/mm for the long version. The main change: sequentially consistent atomics became the documented default for sync/atomic, matching the de facto behaviour of the implementation since around Go 1.0 and aligning the language with C++ memory_order_seq_cst and Java's volatile.

The old model was deliberately weaker — it allowed implementations to optimise atomics aggressively, on the bet that careful programmers would use the right primitives anyway. In practice, almost no one wrote programs that relied on the weaker guarantees, and the mismatch with C++ and Java made it harder to port concurrent code between languages. The new spec is more conservative and more portable, at little runtime cost on modern hardware.

The practical upshot. If you're writing new code, treat sync/atomic as sequentially consistent and move on. If you're reading old Go code, the weaker assumptions it might have made are now stronger than they used to be, so it's at worst no less correct than before.

Common pitfalls

  • The non-atomic flag. for !done {} with done a plain bool is the canonical bug. The compiler may hoist the read into a register and never see the write. Use atomic.Bool or, better, a channel.
  • Pointer tearing on 32-bit. A 64-bit value written non-atomically on a 32-bit platform can be observed half-updated. Struct fields used with sync/atomic on 32-bit ARM must be 8-byte aligned — the spec says so, and the runtime crashes loudly if you violate it.
  • Assuming fmt.Println is a barrier. It isn't. The fact that adding a print statement makes a race "disappear" usually means the print took an internal mutex that incidentally serialised things. Remove the print and the race comes back.
  • False safety on x86. x86 has a strong memory model — almost everything is acquire-release for free. Code that "works on my laptop" can break on ARM (where stores can be reordered) or POWER (where almost anything can be reordered). The race detector is your friend here; it doesn't depend on the host's memory model.
  • Map access without a lock. Concurrent reads of a Go map are fine; concurrent read + write or write + write is a runtime crash, not just undefined. Use sync.Map or wrap the map in a mutex.

A production checklist

  • Race detector in CI. go test -race ./... on every PR. The slowdown is acceptable on a test runner and the bugs it catches are the kind that ruin a weekend.
  • Channels for ownership transfer. If a value is being handed from one goroutine to another, send it on a channel. The happens-before edge is automatic and the code reads as intent rather than mechanism.
  • Atomics only when you can describe the synchronisation in one sentence. "This counter is incremented by many goroutines and read by a metrics scraper" — fine. "This flag coordinates a complex handoff between three goroutines" — use a mutex or a channel.
  • sync.Mutex by default for shared state. It's simple, correct, and well-understood by reviewers. RWMutex is rarely worth the added complexity unless you have measured contention on reads.
  • Document unusual patterns. If you do reach for atomics or a custom synchronisation scheme, write a comment that names the happens-before edge. Future you will be grateful.

Further reading

Found this useful?