CPU Cache Simulator: MESI coherence
Four cores, each with a tiny per-core L1, sharing a small L3 backed by DRAM. Every cache line carries a MESI state. Issue reads and writes manually, or run a scripted scenario — shared reads, false sharing, producer-consumer — and watch the protocol invalidate, fetch, or hit. The cost panel tallies cycles using realistic latency numbers.
Caches
Controls
Cost panel
Bus log
The top board is four cores, each with four L1 cache lines, over one shared L3. Every line shows its slot, the address it holds, and a MESI letter: M (this core has the only dirty copy), E (the only clean copy), S (shared with other cores), I (empty). Issue a read or write from the manual controls, or play a scenario, and the cost panel tallies L1 hits, L3 hits, DRAM misses, and coherence messages using realistic cycle costs. The bus log narrates each operation's outcome.
Start by issuing the same address as a read from core 0, then core 1: the line goes E, then both flip to S. Now write that address from core 0 and watch it invalidate core 1 and jump to M. The scenario that should surprise you is false sharing — two cores writing different bytes of the same 64-byte line. Nothing is actually shared, yet the coherence-message counter climbs about one per write while the L1 hit rate stays near zero, because each store invalidates the other core's copy of the whole line. That invisible ping-pong is what padding variables onto separate lines exists to kill.
What is MESI cache coherence?
MESI is the protocol modern CPUs use to keep multiple cores' caches consistent. Each cache line carries a state: Modified (this core has the only, dirty copy), Exclusive (this core has the only, clean copy), Shared (multiple cores have clean copies), Invalid (no valid copy here). When a core reads or writes, the protocol broadcasts messages that update the states across all caches so reads and writes always see a sane value. The simulator above tracks every line's state on every operation.
The two states most surprising to people: Exclusive exists so that a write to a never-shared line doesn't need any bus traffic at all (E silently upgrades to M), and Shared means more than one core may have it, not necessarily that anyone else does — coherence overhead at the next write is paid based on what other cores actually hold.
How false sharing wrecks throughput
MESI works at the granularity of a 64-byte cache line. Two threads on different cores writing different variables that happen to live on the same line will invalidate each other on every store. The simulator's false sharing scenario reproduces this: cores 0 and 1 alternately write addresses 0x200 and 0x208, both in the same line. Watch the coherence-message counter climb roughly one per op while the L1 hit count stays stuck — every write turns into a write-miss because the previous write invalidated the line.
In real production code the fix is padding each per-thread variable onto its own
cache line. Linux uses ____cacheline_aligned; Java has
@Contended; Rust has crossbeam_utils::CachePadded; .NET
has StructLayout. On Apple silicon the cache line is 128 bytes, so a
64-byte pad is half the protection you think it is.
What the simulator simplifies
- One cache level per core. Real chips have L1 (split I/D) and L2 per core, plus a shared L3. Here, L1 stands in for the whole per-core hierarchy.
- Round-robin replacement. Real caches use pseudo-LRU or RRIP. The simpler policy makes the eviction visible in the four lines we have to work with.
- Tag-only addressing. The simulator treats each address as a line index. Real caches split addresses into tag, set index, and byte offset (covered in the deep dive).
- No prefetching. Real CPUs aggressively pull lines ahead of demand. Adding it would obscure the MESI mechanics, which are the point.
- No NUMA, no MOESI/MESIF. Plain MESI with one socket is enough to demonstrate the protocol; the deep dive covers the extensions.
FAQ
Further reading
- Caches deep dive — the long-form explanation that this simulator illustrates.
- Drepper — What Every Programmer Should Know About Memory
- MESI protocol — Wikipedia state-transition table
- man perf-c2c — Linux tool for finding cache-line contention in production binaries.