Internals
Fifteen deep dives, one per layer of the bare metal. Each page is anchored to real microarchitectures — Apple M4, Intel Raptor Lake, AMD Zen 5, RISC-V SiFive U74 — with real cycle counts, named systems, and the cache-and-coherence footnotes that turn textbook diagrams into production performance work.
All 15 live · 9-page Wave 1+2 + 6-page peripherals batch.
Transistors, gates, and the ALU
NMOS to NAND to flip-flop to adder. The abstraction stack the rest of the path stands on.
ReadClocks, latches, flip-flops
Synchronous logic, setup and hold, why frequency stalled near 5 GHz.
ReadThe instruction cycle
Fetch, decode, execute, writeback. A RISC-V addi traced through the datapath.
ReadPipelining
The five-stage pipeline, and what changes when your laptop runs 14–20 stages.
ReadBranch prediction and speculation
Two-bit counters to TAGE, the 20-cycle mispredict penalty, Spectre as the cost.
ReadOut-of-order execution
Tomasulo, the reorder buffer, register renaming. ~512-entry ROB on Apple M4.
ReadSIMD and vector throughput
SSE, AVX, AVX-512, AMX, SVE. License-down on Skylake-X. Auto-vec versus intrinsics.
ReadThe memory hierarchy
Eight orders of magnitude. Norvig’s table, modernised. Bandwidth versus latency.
ReadCaches and MESI
Direct-mapped to set-associative, replacement, write-back, MESI/MOESI/MESIF, false sharing.
ReadVirtual memory and the TLB
Every load is two loads. Page walks, TLB shootdowns, huge pages, Meltdown.
ReadNUMA and memory bandwidth
Same-socket 80 ns, far-corner 400 ns. DDR5 channels, Linux NUMA balancing.
ReadPCIe, DMA, interrupts
Lanes, generations, root complex. MSI/MSI-X. IOMMUs and GPU passthrough.
ReadSSDs and the FTL
NAND cells, pages versus erase blocks, the Flash Translation Layer, NVMe queues.
ReadGPUs and accelerators
SIMT versus SIMD, warps, HBM. TPUs and AMX as the post-Moore turn.
ReadPower-on, firmware, boot
Reset to UEFI to bootloader to kernel. Microcode. Secure Boot, TPM, Intel ME.
ReadReading order
You can take these in any order — each page stands on its own — but the sequence that builds the most intuition is the one that follows the abstraction stack from bottom to top: gates, then state, then the instruction cycle, then the optimisations (pipelining, branch prediction, OOO, SIMD), then the memory hierarchy and its coherence and translation layers, then the peripherals (PCIe, SSD, GPU), then how the whole thing turns on (boot).
If you're here for one specific topic — almost everyone is, the first time — start with that page. The cross-links inside each page handle the rest.
Canonical sources this directory leans on
- Patterson & Hennessy — Computer Organization and Design (RISC-V Edition). The textbook spine. Concepts pinned to RISC-V so abstractions map to real instructions.
- Hennessy & Patterson — Computer Architecture: A Quantitative Approach. The graduate sequel, heavier on coherence, NUMA, and the post-Moore turn.
- Bryant & O'Hallaron — Computer Systems: A Programmer's Perspective. The most engineer-facing of the textbooks; the reference for how this study path is voiced.
- Sorin, Hill & Wood — A Primer on Memory Consistency and Cache Coherence. Free, definitive, the textbook on coherence protocols.
- Agner Fog's microarchitecture, optimization, and instruction tables. Free, exhaustive, the practitioner's manual for actual chip behaviour.
- Chips and Cheese, Real World Tech, WikiChip. Where the latency and bandwidth numbers in these pages get cross-checked against fresh measurements.