Sync primitives

The sync package is small but every type inside it is doing something clever. sync.Mutex spins briefly before parking, switches to starvation mode after 1 ms of unfairness, and degrades gracefully under contention. sync.Map keeps two maps to avoid locking the read path. sync.Once is a double-checked atomic flag with a fallback mutex. The package compiles down to roughly what you'd write by hand — but only after you'd written it three times wrong.

sync.Mutex — fast path, slow path, starvation mode

A Mutex is a single 32-bit word plus a semaphore. The bottom bits encode locked state, woken state, and starvation mode. The fast path is a single CAS to flip the locked bit — when uncontended, that's the whole story and the cost is one atomic instruction (~25 ns on modern x86).

Under contention, the slow path kicks in. The acquiring goroutine spins for a few iterations (the sync.runtime_canSpin check) on the chance the holder will release momentarily. If the mutex is still held after the spin budget, the goroutine parks via the runtime semaphore — same primitive channels use under the hood. A goroutine that has waited longer than 1 ms flips the mutex into starvation mode: from then on, releases hand the mutex directly to the head of the wait queue instead of letting a newly-arrived goroutine barge in. Starvation mode ends once the queue is short again.

Why this matters. Without starvation mode, a steady stream of new arrivals could starve a parked goroutine indefinitely. The 1 ms threshold trades a small amount of throughput in the contended case for a guarantee that no goroutine waits forever.

sync.RWMutex — readers and writers

RWMutex layers reader counting on top of a regular Mutex. Readers atomically increment a counter; writers wait until the count drops to zero, then acquire the underlying Mutex. The trick is that writers signal "I'm waiting" by flipping the count negative, which causes new readers to block until the writer is done.

RWMutex is faster than Mutex only when reads dominate and the critical section is long enough to amortise the extra atomic operations. Benchmarks consistently show that for short critical sections (a few hundred ns), a plain Mutex wins because RWMutex's bookkeeping cost exceeds the win from parallel reads. Reach for RWMutex when read times are in the microseconds, not nanoseconds.

sync.Once — the double-checked atomic flag

Once.Do guarantees its function runs exactly once across any number of concurrent callers. The implementation is six lines, and reading them is a small lesson in Go's memory model.

// runtime sync/once.go (simplified)
type Once struct {
    done uint32
    m    Mutex
}

func (o *Once) Do(f func()) {
    if atomic.LoadUint32(&o.done) == 0 {
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Fast path: one atomic load. If done is set, return immediately. Slow path: grab the mutex, check again (the second check covers the case where another caller was in doSlow at the same time), run f, then atomically set done. The atomic store happens after f returns and is paired with the fast-path atomic load, which gives the happens-before relationship that makes subsequent callers see f's side effects.

sync.Map — when a regular map + Mutex isn't enough

sync.Map is built for two specific workloads: stable keys with mostly reads, or keys written once and read many times. Internally it keeps two maps — a read-only read map served lock-free, and a dirty map protected by a Mutex.

Reads check the read map first; if the key is there, no lock is taken. Misses fall through to the dirty map under lock. When too many misses accumulate, the dirty map is promoted to be the new read map. Writes go to the dirty map and trigger promotion eventually.

Use it when the access pattern is read-heavy and the key set churns slowly. For everything else — write-heavy, range-heavy, or small maps — a regular map[K]V with a sync.RWMutex is faster and uses less memory.

Common mistake. Treating sync.Map as a drop-in replacement for map + Mutex. It isn't — the read/dirty split adds overhead that's only worth paying for a specific access pattern. Benchmark before swapping in.

sync.WaitGroup — counter with a wakeup

WaitGroup is conceptually trivial: a counter, an atomic Add, and Wait that blocks until the counter hits zero. The implementation packs the counter and a waiter count into a single 64-bit word so the whole thing can be done with one CAS per operation.

The one rule that catches people: Add must happen before Wait is called, not after. Calling wg.Add(1) inside the goroutine you just launched is a race — the parent's wg.Wait() may see a zero counter and return immediately. Always Add before go.

sync/atomic — when to skip the mutex entirely

Atomic operations cost one CAS or one memory barrier — roughly 5–25 ns on modern x86. Mutex acquisition uncontended costs about the same; contended, much more. For a single counter, a single flag, or a single pointer, sync/atomic is the right tool.

Anything more — two pieces of state that must update together, or a counter plus a value that depend on each other — needs a Mutex. The trap is "I'll do two atomic ops back to back"; without a lock, another goroutine can observe the intermediate state.

Sync primitives

sync.Mutex — fast path, slow path, starvation mode

sync.RWMutex — readers and writers

sync.Once — the double-checked atomic flag

sync.Map — when a regular map + Mutex isn't enough

sync.WaitGroup — counter with a wakeup

sync/atomic — when to skip the mutex entirely

Further reading