Virtual Memory Simulator, end to end
Virtual memory gives each process its own flat address space and lets the CPU and kernel translate those virtual pages to physical frames on demand. Here a process has 16 virtual pages, the hardware has 6 physical frames and a 4-entry TLB, and the kernel has a tiny swap area. Click any virtual page. The translation either hits the TLB (1 cycle), walks the page table (50 ns), takes a minor fault (1 µs), or takes a major fault from swap (80 µs). All four happen on a real CPU, all the time.
| ppn | vpn | flags | lru |
|---|---|---|---|
| p0 | — | — | |
| p1 | — | — | |
| p2 | — | — | |
| p3 | — | — | |
| p4 | — | — | |
| p5 | — | — |
The top grid is the process's 16 virtual pages; the two tables below are the hardware the kernel maps them onto — a 4-entry TLB and 6 physical frames — plus a small swap area. Click a page to read it, shift+click to write. Each touch runs the real translation path: a TLB hit (one cycle), a page-table walk (50 ns), a minor fault that zero-fills a fresh frame (1 µs), or a major fault that reads back from swap (80 µs). The log narrates every step and the stats line tallies hits, walks, faults, evictions, and copy-on-writes.
Click "preset: sequential 8" first. Six pages become resident, then the seventh and eighth force the LRU pages out, and the TLB warms up. Now click "preset: thrashing": ten pages cycle through six frames and the hit rate collapses toward zero — every access faults because the working set no longer fits. What should surprise you is how sharp that cliff is. Six frames cope; the moment the active set is seven, the machine spends its time shuffling pages instead of running your code. The fork() preset shows the other trick — every page goes copy-on-write, and only the page you write gets its own copy.
Every pointer is a lie
The address in your variable is not the address where the bytes live.
Virtual memory exists because three problems all wanted solving at once. Without it, every process would have to know where it sat in physical RAM and arrange not to collide with its neighbours. Every program would need its own loader-time relocation. Every process would have to trust every other process not to overwrite it. And no program could ever use more memory than the machine physically had. Virtual memory is the kernel's promise to handle all three by lying to every process about where its bytes live.
The mechanism: every memory access goes through the MMU, which translates virtual addresses into physical ones using per-process page tables. On x86_64 those tables are four levels deep (PML4 → PDPT → PD → PT) and each level is itself a 4 KiB page of 512 entries. Walking them costs a memory access per level — 200+ cycles for a full miss. The TLB exists to cache the translation so the next access to the same page is one cycle, not 200.
Three downstream consequences follow. Processes are isolated — your
page table doesn't have an entry for my pages, so your code can't address them at all.
Sharing is opt-in — mmap(MAP_SHARED) tells the kernel to
install the same physical page in two address spaces. Overcommit is possible
— the kernel can promise more virtual address space than physical RAM, and as long as
processes don't all touch every page at once, swap absorbs the rest. The third is the
source of every "my server got OOM-killed" debugging story.
Copy-on-write made fork() free
The reason your shell can spawn a thousand processes a second.
A naive fork() would have to copy the parent's entire address space — a
multi-gigabyte process forking would mean multi-gigabyte allocation and copy on every
call. Early Unix did exactly this; the only escape was vfork(), which
required the child to immediately exec() and behave well in between.
Copy-on-write turned that cost from "as much as the address space" to "as much as the
pages the child writes to." After fork(), the kernel duplicates the page
tables but marks every writeable page read-only in both parent and child, with a COW
flag set. The first write to any page triggers a fault; the handler allocates a fresh
frame, copies the original page's contents, and remaps that page writeable in the
writing process only. The other process still sees the original. A child that calls
exec() immediately throws its address space away and pays zero copy cost.
This is the entire reason Redis's BGSAVE works: the parent process keeps serving while
the child snapshots its memory to disk, and the cost is only the pages the parent writes
during the snapshot (typically a small fraction of the dataset). Containers depend on the
same mechanism — every docker run is a clone() with extra
namespace flags, and the COW behaviour is what lets the new container start cheaply.
When the working set won't fit, performance falls off a cliff
Not gracefully. Catastrophically.
A program's working set is the set of pages it actually touches over some short window of time. If the working set fits in physical memory, every access is cheap — TLB hit or page-walk, both microsecond-scale. If the working set exceeds physical memory by even one page, the kernel evicts a page every time a new one comes in, and if the workload is loopy, the evicted page is exactly the one needed next.
This is thrashing. The press the preset above triggers it deliberately: 10 pages cycled through 6 frames. Every access faults. Every fault evicts a page that will be needed within the next few accesses. Throughput collapses by 4-5 orders of magnitude because you've replaced sub-microsecond accesses with sub-millisecond ones. The CPU spends almost all its time in the page-fault handler. In production this presents as "my service was fine all day, then suddenly the latency went from 5 ms to 5 s and the CPU is at 100% but the kernel says 80% iowait" — that's thrashing through swap.
The two ways out: reduce the working set (use a smaller dataset, stream rather than
load, use mmap with MADV_DONTNEED to drop pages you're done with) or add memory (more
RAM, or fewer concurrent workloads on the box). The kernel's heuristics
(swappiness, the WSS estimator, MGLRU) can shift the cliff but cannot
abolish it. Treat physical memory as a hard architectural constraint and the curve
never bites.