The Go memory model
What does it mean for two goroutines to see the same value at the same time? The Go memory model answers that question by defining a happens-before relation: a partial order over memory operations that the runtime, compiler, and hardware must collectively respect. Get the edges right and concurrent code is portable across architectures; get them wrong and it works on x86, breaks on ARM, and is undefined either way.
Why the memory model matters
Concurrent programs ask one awkward question over and over: when goroutine A writes to a variable and goroutine B reads it, what does B see? Without a memory model, the answer depends on the CPU, the compiler, the optimiser, and the phase of the moon. With one, there's a contract: under these conditions, B sees A's write; outside them, the program is undefined.
The Go memory model has had two main eras. The 2009 spec was deliberately weak — it told you about channels and mutexes but said little about racy programs beyond "don't". The 2022 revision aligned Go with the C/C++ and Java approach: sequentially consistent atomics by default, a more precise happens-before relation, and an explicit statement that a data race is undefined behaviour rather than something with a "best effort" outcome.
The happens-before relation
Happens-before is a partial order over memory operations. If event a
happens-before event b, then b is guaranteed to observe the
effects of a. If neither a happens-before b nor
b happens-before a, and at least one of them is a write to the
same location, you have a data race.
There are four sources of happens-before edges in Go:
- Program order within a goroutine. Statements that appear earlier in the source happen-before later ones, modulo the compiler's freedom to reorder anything that doesn't cross a synchronisation point.
- Channel operations. A send on a channel happens-before the corresponding receive completes. For unbuffered channels, the receive also happens-before the send returns.
- Mutex operations. For any
sync.Mutex, the n-thUnlockhappens-before the (n+1)-thLockreturns. - Atomic operations. Since Go 1.19, atomics in
sync/atomicare sequentially consistent — they participate in a single total order that all goroutines agree on.
// Classic race. No happens-before edge between G1's write and G2's read.
var x int
var done bool
go func() { // G1
x = 42
done = true // racy write
}()
for !done { // G2: racy read in a hot loop
}
fmt.Println(x) // may print 0, may print 42, may never get hereThe spec correctly calls this undefined. In practice, on x86 the compiler may hoist
done into a register and loop forever; on ARM the read of x may
be reordered before the read of done and print zero. Both are valid under the
memory model because the program has no synchronisation.
Synchronisation primitives
The standard library gives you a small set of tools that each carry well-defined happens-before guarantees. Use them and you don't have to reason about the underlying hardware.
sync.Mutexandsync.RWMutex.Unlockacts as a release; the nextLockon the same mutex acts as an acquire. Everything written before the unlock is visible to anyone who observes the next lock.sync.WaitGroup. A call toDonehappens-before anyWaitthat returns because of it. The pattern ofAdd→ goroutine →Done→Waitgives you a clean fence.sync.Once. The first call toDo(f)runsfto completion before any other call toDoreturns. This makessync.Oncesafe for lazy initialisation without an explicit mutex.- Channels. The richest source of happens-before edges in Go. Covered in detail on the channels page.
sync/atomic
sync/atomic exposes hardware atomic operations as ordinary function calls.
Since Go 1.19 the package also offers typed wrappers (atomic.Int64,
atomic.Pointer[T], etc.) that are easier to use correctly than the older
pointer-taking functions. All operations are sequentially consistent by default — the
same semantics as a C++ std::atomic with the default
memory_order_seq_cst.
| Operation | Memory semantics |
|---|---|
Load | Sequentially consistent acquire |
Store | Sequentially consistent release |
CompareAndSwap | Full barrier, both sides |
Add / Swap | Full barrier, both sides |
atomic.Value | CAS-free reads of arbitrary types via a typed pointer swap |
Sequentially consistent means there is a single global order of atomic operations that every goroutine agrees on. It's the easiest model to reason about and usually the right default. The cost is a memory fence on weakly-ordered architectures (ARM, POWER); on x86 most atomics are nearly free because the hardware is already strongly ordered.
What sync/atomic doesn't give you
Atomic operations are atomic, not magical. A few things they explicitly do not provide:
- Atomicity across multiple operations. Reading an
atomic.Int64, deciding what to do, and writing it back is three operations. Another goroutine can interleave between any two of them.CompareAndSwapin a loop is the usual fix. - Protection against TOCTOU. "Check this atomic flag, then do something based on it" has the same time-of-check / time-of-use hazard as any other check. The flag can change between the load and your next instruction.
- A replacement for higher-level synchronisation. If you find yourself
building a lock out of atomics, you almost certainly want
sync.Mutex. The standard library implementations are already optimised, already correct under the memory model, and already understood by reviewers.
A good test: if you cannot describe in one sentence what the atomic is synchronising between which two goroutines, reach for a mutex.
The race detector
Go ships with a race detector built on Google's ThreadSanitizer. It instruments every memory access to record a vector clock per goroutine, then flags any pair of accesses to the same location where at least one is a write and neither happens-before the other.
You enable it with -race on any of the usual subcommands:
go test -race, go run -race, go build -race. There
is no separate build mode to remember — the same flag works everywhere.
$ go test -race ./...
==================
WARNING: DATA RACE
Write at 0x00c00001a0e0 by goroutine 7:
main.worker()
/src/main.go:14 +0x44
Previous read at 0x00c00001a0e0 by goroutine 6:
main.main()
/src/main.go:21 +0x88
==================Runtime cost is roughly 5–10× slower and 2× more memory, so it's a development and CI tool, not a production one. Two important caveats: the race detector finds races on code paths it actually exercises, so coverage matters; and "no race detected" does not mean "correct", it means "no race observed on these inputs in this run".
The 2022 revision
In 2022 Russ Cox led a revision of the memory model — see research.swtch.com/mm for the long version. The main change: sequentially consistent atomics
became the documented default for sync/atomic, matching the de facto behaviour
of the implementation since around Go 1.0 and aligning the language with C++
memory_order_seq_cst and Java's volatile.
The old model was deliberately weaker — it allowed implementations to optimise atomics aggressively, on the bet that careful programmers would use the right primitives anyway. In practice, almost no one wrote programs that relied on the weaker guarantees, and the mismatch with C++ and Java made it harder to port concurrent code between languages. The new spec is more conservative and more portable, at little runtime cost on modern hardware.
sync/atomic as sequentially consistent and move on. If you're reading old
Go code, the weaker assumptions it might have made are now stronger than they used to
be, so it's at worst no less correct than before.Common pitfalls
- The non-atomic flag.
for !done {}withdonea plainboolis the canonical bug. The compiler may hoist the read into a register and never see the write. Useatomic.Boolor, better, a channel. - Pointer tearing on 32-bit. A 64-bit value written non-atomically on a
32-bit platform can be observed half-updated. Struct fields used with
sync/atomicon 32-bit ARM must be 8-byte aligned — the spec says so, and the runtime crashes loudly if you violate it. - Assuming
fmt.Printlnis a barrier. It isn't. The fact that adding a print statement makes a race "disappear" usually means the print took an internal mutex that incidentally serialised things. Remove the print and the race comes back. - False safety on x86. x86 has a strong memory model — almost everything is acquire-release for free. Code that "works on my laptop" can break on ARM (where stores can be reordered) or POWER (where almost anything can be reordered). The race detector is your friend here; it doesn't depend on the host's memory model.
- Map access without a lock. Concurrent reads of a Go map are fine;
concurrent read + write or write + write is a runtime crash, not just undefined.
Use
sync.Mapor wrap the map in a mutex.
A production checklist
- Race detector in CI.
go test -race ./...on every PR. The slowdown is acceptable on a test runner and the bugs it catches are the kind that ruin a weekend. - Channels for ownership transfer. If a value is being handed from one goroutine to another, send it on a channel. The happens-before edge is automatic and the code reads as intent rather than mechanism.
- Atomics only when you can describe the synchronisation in one sentence. "This counter is incremented by many goroutines and read by a metrics scraper" — fine. "This flag coordinates a complex handoff between three goroutines" — use a mutex or a channel.
sync.Mutexby default for shared state. It's simple, correct, and well-understood by reviewers.RWMutexis rarely worth the added complexity unless you have measured contention on reads.- Document unusual patterns. If you do reach for atomics or a custom synchronisation scheme, write a comment that names the happens-before edge. Future you will be grateful.
Further reading
- go.dev/ref/mem — The Go Memory Model — the spec itself, post-2022 revision. Short, precise, worth a careful read.
- research!rsc — Memory Models — Russ Cox's three-part series leading up to the 2022 revision. Covers hardware, the C/C++ model, and the Go-specific design choices.
- Sutter & Alexandrescu — C++ Concurrency in Action — for the cross-language treatment. Once you understand how C++'s memory orders fit together, Go's choice to default to seq_cst makes more sense.
- ThreadSanitizer algorithm — how the race detector actually works, vector clocks and all.
- Channels (this series) — channels in detail, including the happens-before edges they create.
- The scheduler (this series) — what actually runs the goroutines whose memory accesses you're trying to order.