Linked List Simulator: pointers, not slots.
Each node points to the next. Insertion and deletion are O(1) at the right place — no shifting. Random access is O(n). The trade is locality.
Each box is a node holding a value, with its index in brackets below. The arrows are the actual pointers: one arrowhead in singly mode (next only), two in doubly mode (next and prev), and the endcap reads NULL or TAIL depending on the variant. The buttons add or remove at the head, the tail, or the middle, and the Recent ops log records what each click did.
Try the middle buttons first. Click "+ middle" or "− middle" and watch which arrows change: only the two pointers on either side of the splice point get rewired, and every other node stays exactly where it is. That is the one thing a linked list does that an array can't do cheaply, since an array would have to shift everything after the gap. Then flip from singly to doubly and notice the second arrowhead appear on every link. What should surprise you is how little moves on any edit, and how that locality is precisely the property that costs you a cache miss on every node when you later walk the whole list in order.
What is a linked list?
When a sequence needs to grow without warning.
A linked list is a sequence of nodes where each node holds a value and a pointer to the next. The structure dates to IPL (1956) and LISP (1958) — Allen Newell and Cliff Shaw used it for symbol manipulation in early AI work. Linked lists offer O(1) insert and remove given a node reference; they pay for it with poor cache locality. The Linux kernel's list_head is a doubly-linked list; every functional language's list type is a singly-linked cons-cell list.
Imagine you are keeping a queue of jobs to run. New jobs arrive at unpredictable rates; some jobs cancel themselves halfway through; old jobs are dequeued from the front. Your first instinct is to use an array, because arrays are familiar. The job at index zero is next; you advance a pointer; you append to the end. It works fine for a few thousand jobs. Then you hit the awkward case: you need to remove a cancelled job from the middle of the queue, and the array forces you to shift every later element one slot toward the front. For a queue of a million jobs, that is roughly half a million memory writes per cancellation. You are spending most of your CPU time copying bytes that have nothing to do with your problem.
The other failure mode is the opposite. You start with a small array — sixteen slots, say — and run out of room. The standard fix is to allocate a bigger array, copy every element across, and free the old one. Most of the time it is fine. Sometimes the resize lands in the middle of a hot path, and a single push pays for copying ten million elements. Latency spikes; tail percentiles ruin your day. Whatever you have built on top of "an array that grows" inherits this hiccup at every doubling.
The linked list trades a different bargain. Instead of one contiguous block of memory, each element is its own little allocation containing a value plus a pointer to the next element. Insertion and removal are local: rewrite at most two pointers and you are done, no shifting, no resize, no doubling-cost spikes. You can splice an entire sublist out of one list and into another in two pointer writes, regardless of how many elements that sublist holds. The cost is locality: those nodes are scattered across the heap rather than packed together, so reading them in order is slower than walking an array of the same length. The simulator above lets you push and pop at the head, the tail, and the middle, in either the singly-linked or doubly-linked variant. Watch how each operation rewires only its immediate neighbours; the rest of the list stays exactly where it was.
Concretely: appending one item to a million-element std::vector in C++ costs one nanosecond on the steady state and around four milliseconds on the doubling step. Appending to a doubly-linked list costs roughly forty nanoseconds every time, no spikes. Removing one element from the middle of the array costs ~2 ms; removing it from the linked list, given a pointer to the node, costs ten nanoseconds. The right answer depends on which operation dominates your workload — and on whether you are walking the structure in order, where the array's cache-friendliness wins by hundreds of times.
Origins — IPL to LISP, the structure that invented list processing
From IPL to LISP, the structure that invented list processing.
The linked list is older than the high-level languages it's traditionally taught in. Allen Newell, Cliff Shaw, and Herbert Simon — working at the RAND Corporation and Carnegie Tech in 1956 — built the Information Processing Language (IPL) to support their Logic Theorist, the first computer program that proved mathematical theorems. IPL had pointers, recursion, and lists as first-class values; everything in the language was a node in a linked structure. The 1957 IPL-V variant was the first programming environment built around the cell-and-pointer abstraction, and it ran on the Johnniac.
John McCarthy generalised the idea two years later. Recursive Functions of Symbolic Expressions and Their Computation by Machine, published in CACM in April 1960, defined LISP — LISt Processor — on top of three primitives: cons (build a pair), car (head of pair), cdr (rest of pair). The funny names come from the IBM 704 hardware: contents of address register and contents of decrement register, the two halves of a 36-bit machine word that LISP used to store one cons cell. A LISP list is exactly a chain of cons cells, the rightmost cdr being the constant nil. Sixty-five years later, every Scheme, Clojure, Racket, and Common Lisp implementation still represents lists this way.
The structural elegance of the cons-list is that recursion and lists are dual: every list is either empty or a head followed by another list. Functions that consume lists pattern-match on those two shapes, and the recursive descent matches the structure of the data exactly. McCarthy's paper introduced not just lists but conditional expressions, recursive function definitions, and garbage collection — concepts so foundational that it's easy to forget they entered the canon together, in the same year, in service of a single data structure.
Donald Knuth's The Art of Computer Programming, Volume 1: Fundamental Algorithms (Addison-Wesley, 1968; third edition 1997) gave the linked list its formal treatment in section 2.2: singly-linked, doubly-linked, circular, and the implementation tricks that make them fast. By the time C arrived in 1972, linked lists were already standard fare; K&R's first edition (1978) used them as the canonical example of dynamic memory management with malloc and free.
Linked list variants — singly, doubly, XOR, unrolled, skip
Singly, doubly, XOR, unrolled, skip.
The textbook singly-linked list stores one pointer per node: next. Insertion at the head is O(1); insertion before a known node requires O(n) traversal to find the predecessor. The doubly-linked list stores two pointers per node, next and prev, and pays an extra 8 bytes per node on a 64-bit system in exchange for O(1) deletion given a node pointer. Most production code uses doubly-linked because deletion-by-pointer is the operation that distinguishes a linked list from anything else.
The XOR linked list (Sinha, 2004 popularised but the trick is older) stores prev XOR next in a single pointer slot. Knowing the previous node lets you compute the next node by XORing the slot. Cute, halves the per-node memory of a doubly-linked list, and entirely incompatible with garbage-collected runtimes — the GC can't see the pointers, so it can't trace reachability. Confined to embedded systems and academic curiosities.
The unrolled linked list stores not one element per node but k elements (typically 16–64) in a small contiguous array per node. The pointer chasing rate drops by a factor of k, the cache misses drop with it, and the asymptotic complexities are unchanged. Hans Boehm's cord (1995) used unrolled lists for ropes; Apple's CoreFoundation CFString uses a similar block structure for very long strings; some database systems use unrolled lists for in-memory indexes.
The skip list (William Pugh, Communications of the ACM, June 1990) is the most successful linked-list variant in production today. Each node carries multiple next pointers at randomly chosen levels; level 0 is the full list, level 1 skips half the nodes on expectation, level 2 skips three-quarters, and so on. Search descends from the highest level downward, taking O(log n) expected time without ever rebalancing. Redis uses skip lists for sorted sets (ZSET); LevelDB and the embedded HBase memstore use them for the in-memory write buffer. Pugh's original argument was that skip lists are easier to implement than red-black trees and faster than AVL trees in practice; thirty-five years of production use have borne him out.
Two more shapes worth naming. The circular list closes the loop — tail.next = head — and shows up in round-robin schedulers and clock-replacement page caches. The intrusive list embeds the link pointers in the payload struct itself rather than wrapping the payload in a separate node; the Linux kernel's list.h, Boost's Boost.Intrusive library, and FreeBSD's queue(3) macros are the canonical implementations.
Pugh's 1990 paper argued that probabilistic balancing was easier to implement than red-black rotations and faster than AVL in practice. Thirty-five years later, Redis sorted sets, the LevelDB memtable, and HBase's in-memory store still use skip lists for exactly that reason — short code, no rotation case work, comfortable concurrency story.
The asymptotics lie — modern CPUs love arrays
The asymptotics lie. Modern CPUs love arrays.
Big-O says inserting at the head of a linked list is O(1) and inserting at the head of an array is O(n). Both are true. Both are also misleading, because they say nothing about the hidden constant — and on modern hardware, the constant is dominated by cache behavior, not arithmetic. The textbook recommendation that “you should use a linked list when you do many inserts in the middle” is wrong on every CPU built since 2005.
The numbers are stark. An L1 cache hit on a 2024-era x86-64 CPU is about 4 cycles, roughly 1.0 ns. An L2 hit is around 12 cycles, 3 ns. An L3 hit on a multi-socket box is 40–60 cycles, 15–20 ns. A miss to DRAM is 200–400 cycles, 60–100 ns. Every node->next dereference in a long linked list is a fresh DRAM round trip, because the prefetcher has nothing to learn — the next address is only known after the current load completes. Walking a one-million-node linked list in random allocation order takes around 100 ms. Walking a one-million-element int array takes around 0.4 ms. The asymptotic complexity is identical; the wall-clock difference is 250×.
Bjarne Stroustrup — the designer of C++ — gave a 2014 keynote at GoingNative titled Why You Should Avoid Linked Lists. He benchmarked sorted insertion of n random integers using std::vector versus std::list, where vector needs O(n) memmove per insert and list needs O(n) search but only O(1) insert. The vector beat the list at every n he measured, from 100 elements to 100 000, and the gap widened with size. The takeaway: pointer chasing through scattered allocations is so expensive on modern hardware that it overwhelms the asymptotic savings. The talk is worth watching once just to internalise the curve.
Three categories of workload still favour linked lists. First, splicing: moving a sublist from one list to another in two pointer writes, with no copying. Second, stable iterators: a reference to a list node remains valid after insertions and deletions elsewhere; an iterator to a vector element is invalidated by any reallocation. Third, lock-free concurrency: the Michael–Scott queue, Treiber stack, and Harris's 2001 lock-free linked list (T. L. Harris, A Pragmatic Implementation of Non-Blocking Linked-Lists, DISC 2001) all rely on single-pointer CAS operations. Lock-free vector resize is much harder; production systems mostly don't bother.
| Container | Random access | Append | Insert mid | Cache |
|---|---|---|---|---|
| Array / vector | O(1) | O(1) amort. | O(n) memmove | excellent |
| Singly linked | O(n) | O(1) at head | O(1) given prev | poor |
| Doubly linked | O(n) | O(1) either end | O(1) given node | poor |
| Skip list | O(log n) exp. | O(log n) | O(log n) | moderate |
| Unrolled list | O(n / k) | O(1) amort. | O(k) | good |
Where linked lists live in kernels and allocators
Kernels, allocators, free lists, hash chains.
Despite the cache-locality argument, linked lists are everywhere in production code — once you know where to look. The Linux kernel's include/linux/list.h defines a doubly-linked, circular, intrusive list as the kernel's universal collection. Every struct task_struct (a process), every struct file (an open file descriptor), every struct page (a physical memory page) participates in multiple lists simultaneously: the runqueue, the process tree, the per-cgroup membership, the LRU page-reclaim list. A single struct typically embeds three or four struct list_head fields. With the intrusive convention, no allocation per node and no extra cache miss to reach the payload — the link is the payload's neighbour in memory.
The kernel's slab and slub allocators (Bonwick, USENIX Summer 1994; SLUB by Christoph Lameter, 2007) keep free lists per size class — a singly-linked list of free objects threaded through the unused memory itself. Allocation is a single pointer load from the head; freeing is a single pointer write at the head. Both are O(1) amortised and lock-free per CPU after Lameter's per-CPU partial-slab redesign. The same free-list trick is in glibc's malloc (tcache bins, fastbins, smallbins, largebins), in jemalloc (Facebook's allocator since 2012), in tcmalloc (Google), and in mimalloc (Microsoft Research, 2019). Every modern general-purpose allocator is a forest of free lists.
Hash tables typically resolve collisions with linked lists called chains. Java's HashMap, Python's dict (until 3.6 introduced compact dicts), and Go's map all use chaining when the load factor crosses a threshold. The chain length determines worst-case lookup; Java 8 (2014) added the optimisation of converting a chain to a balanced tree once it exceeds 8 entries, capping worst-case at O(log n) even under hash flooding attacks.
The LRU cache — least-recently-used eviction — is the most famous algorithm built on linked lists. A doubly-linked list ordered from most-recent to least-recent, plus a hash map from key to list node, gives O(1) lookup, O(1) insertion, and O(1) eviction-of-the-tail. Memcached, Redis (with the allkeys-lru policy), Varnish, the Linux page cache, and every CPU's L2/L3 cache controller use a structural variant. Without the linked list, the “move this to the front” operation would cost O(n) on every cache hit; with it, every cache hit is two pointer rewrites.
/* Linux kernel list.h, simplified */
struct list_head {
struct list_head *next, *prev;
};
#define LIST_HEAD_INIT(name) { &(name), &(name) }
#define list_for_each(pos, head) \
for (pos = (head)->next; pos != (head); pos = pos->next)
static inline void list_add(struct list_head *new,
struct list_head *head)
{
new->next = head->next;
new->prev = head;
head->next->prev = new;
head->next = new;
}Concurrent linked lists — hand-over-hand, CAS, hazard pointers
Hand-over-hand locks, CAS chains, hazard pointers.
A linked list shared across threads is more delicate than it looks. The simplest correct strategy is a coarse mutex around the whole list: every operation takes the lock. Throughput is bounded by single-thread speed regardless of core count. The next step up is hand-over-hand locking (also called lock coupling): hold a lock on the current node, take a lock on the next node, release the previous. Multiple threads can traverse simultaneously as long as they don't overlap. Hand-over-hand is the standard technique for B-tree concurrency and for the per-bucket locks in some hash tables.
Lock-free linked lists are an entire research field. Tim Harris's 2001 paper A Pragmatic Implementation of Non-Blocking Linked-Lists (DISC, October 2001) gave the first practical algorithm: deletion is a two-step process — logically mark the node as deleted via a CAS on the next pointer with a tag bit, then physically remove it via a second CAS. Concurrent insertions cooperatively help complete pending deletions. Maged Michael's 2002 follow-up (High Performance Dynamic Lock-Free Hash Tables and List-Based Sets, SPAA) added safe memory reclamation via hazard pointers, solving the use-after-free problem that haunts every lock-free pointer-based structure.
The ABA problem is the sharpest hazard. Thread A reads pointer P = 0xABCD. The OS preempts A. Thread B pops the node at 0xABCD, frees it, allocates a new node that happens to land at the same address, and pushes it. A resumes, runs CAS expecting 0xABCD, and the CAS succeeds — even though the value at that address is now a completely different node. The result is corrupted state. Solutions: tagged pointers (pack a 32-bit version counter into the high bits of the pointer, so every modification bumps the tag and the CAS cannot succeed against a recycled value), 128-bit CAS (DCAS, available on x86-64 as cmpxchg16b), garbage collection (the recycled address can't be reused while a reference exists), or hazard pointers (each thread publishes the addresses it's currently dereferencing; reclamation waits until no thread is hazard-pointing the address).
Java's java.util.concurrent.ConcurrentLinkedDeque, Microsoft's .NET ConcurrentLinkedDeque, and Folly's UnboundedQueue are production realisations of these ideas. Each one cites the Michael, Scott, and Harris papers in source comments. Three decades of refinement; the algorithms haven't changed much since 2002, but the empirical wisdom about cache-line padding and back-off has.
Functional linked lists — cons cells, structural sharing, copy-on-write
Cons cells, structural sharing, copy-on-write.
The cons list is the original purely functional data structure. Because nothing is mutated — cons always allocates a new pair — multiple lists can share their tails. Prepending to a list does not affect anyone else holding the original list. Two stacks built by prepending different elements to a common base share that base for free; the structure is persistent in the sense of Driscoll, Sarnak, Sleator and Tarjan's seminal 1989 Journal of Computer and System Sciences paper Making Data Structures Persistent.
Chris Okasaki's 1996 PhD thesis at Carnegie Mellon, expanded into the 1998 Cambridge University Press book Purely Functional Data Structures, took persistence seriously across the whole catalogue: persistent queues with worst-case O(1) per operation, persistent random-access lists with O(log n), persistent priority queues with strong worst-case bounds. The trick across the book is scheduling: incrementalising lazy work so that any single operation pays only its share of the amortised cost, even under arbitrary persistence. The book is short, dense, and required reading for any compiler or PL implementer.
Generational garbage collectors lean heavily on linked lists. Each minor collection promotes survivors from the young generation to the old generation, but pointers from old to young must be tracked so the old generation can act as roots for the young collection. The bookkeeping is a remembered set: a per-region linked list (or hash set) of cards or pages dirtied by old-to-young writes. The HotSpot JVM, V8, .NET CLR, and OCaml runtime all use this pattern; the dirty-card list is updated by a write barrier on every reference store.
Copy-on-write linked lists are the persistence model of LMDB, BoltDB, and other B+tree-based key-value stores: a write produces a new path of nodes from the modified leaf to the root, sharing all sibling subtrees with the previous version. Old transactions can continue reading the previous root indefinitely; once they finish, the orphaned paths become reclaimable. The same mechanism underlies Git's object database — a commit's tree is a linked structure of subtrees and blobs, and unchanged subtrees are shared with the parent commit's tree by reference.
The cost of persistence is allocation pressure. Every modified path costs O(log n) fresh cells; long-running systems can churn millions of nodes per second. Generational garbage collectors handle this well because the new cells almost all die young, but reference-counted runtimes (CPython, Swift, Objective-C) pay a per-cell decrement cost that can dominate. Clojure, designed by Rich Hickey in 2007 around persistent data structures, sits on the JVM partly because the JVM's young-generation collector handles the allocation rate gracefully.
When NOT to use a linked list — reach for an array first
Reach for an array first.
The default container for new code on a modern CPU is a dynamic array: std::vector in C++, ArrayList in Java, []int in Go, Vec in Rust, list in Python (which is actually a dynamic array, despite the name), Array in JavaScript. Reach for a linked list only when the workload includes one of three things: O(1) splice, stable iterators across modifications, or lock-free concurrent access. Otherwise the array is faster on every measurable dimension — throughput, latency, memory footprint, allocator pressure.
Several patterns that look like linked-list problems aren't. A FIFO queue does not need a linked list; a ring buffer is faster. A stack does not need a linked list; vector::push_back and pop_back are O(1) amortised and cache-friendly. An LRU cache does need a linked list at the data-structure level — but the cache itself can be implemented with a contiguous slab plus a small linked list of indices, avoiding individual node allocations.
If your performance-critical workload does demand a linked list, profile the allocator. In Linux user space, malloc for tiny nodes can be the dominant cost. jemalloc and mimalloc are faster than glibc malloc on small allocations; pool allocators (a slab of pre-allocated nodes) are faster still. The kernel's kmem_cache_create is a slab allocator tuned for one struct size at a time. Boost.Pool, Loki's small-object allocator, and Andrei Alexandrescu's Modern C++ Design chapter on policy-based allocation are good starting points.
One last caution: linked-list intuition is dangerous in a managed language. A Java LinkedList is a java.util.LinkedList, a doubly-linked list of small objects each requiring its own header, GC bookkeeping, and indirection. ArrayList beats it on virtually every workload, and the JDK source comments explicitly say so. The same applies to .NET's LinkedList<T> versus List<T>. The garbage-collected pointer dereferences that look free aren't; they cost the same DRAM round trip as in C.
A worked rule of thumb: if your collection has fewer than a thousand elements and is rebuilt rather than mutated in place, use an array. If it has tens of thousands and you mutate by appending, use a dynamic array. If you need O(1) deletion in the middle and you have the node pointer (LRU caches, free lists, kernel runqueues), use a doubly-linked list. If you need fast ordered traversal with frequent insertion-by-key, use a skip list or a B-tree, not a sorted linked list. Reach for a singly-linked cons-list only inside a functional-programming context where structural sharing buys you more than it costs.
Further reading on linked lists
Primary sources, in order.
- Knuth · TAOCP Vol. 1The Art of Computer Programming, Section 2.2Singly, doubly, circular and the implementation tricks. The original definitive treatment, still the place to look up subtleties.
- Linux kernel · list.hIntrusive linked-list macrosThe reference implementation of an intrusive doubly-linked circular list. Twenty-five years of careful optimisation; worth reading once end to end.
- Pugh · 1990Skip Lists: A Probabilistic Alternative to Balanced TreesCACM, June 1990. Redis sorted sets and the LevelDB memtable both implement Pugh's algorithm essentially unchanged.
- Harris · 2001A Pragmatic Implementation of Non-Blocking Linked-ListsDISC 2001. The mark-and-help technique that underlies every lock-free linked list and ordered set since.
- Stroustrup · 2014Why You Should Avoid Linked ListsGoingNative 2014 keynote. Live-benchmarked sorted insertion: vector beats list at every n. The clearest single demonstration of cache-locality dominance.
- Okasaki · 1996Purely Functional Data StructuresCMU PhD thesis, later expanded into the 1998 Cambridge University Press book. Persistent stacks, queues, and priority queues with optimal bounds.
- Bonwick · 1994The Slab Allocator: An Object-Caching Kernel Memory AllocatorUSENIX Summer 1994. The free-list-based per-size-class allocator design used in every Unix kernel since Solaris.
- AlgorithmicaHPC Data Structures — Binary SearchA modern, cache-aware analysis of search structures. Why arrays-with-binary-search beat balanced trees in practice.
- CLRS · 4th ed., 2022Introduction to Algorithms, Chapter 10The textbook chapter on elementary data structures: stacks, queues, linked lists, rooted trees. Pseudocode and asymptotic analysis aimed at undergraduates.
- MIT 6.006 · OCWIntroduction to Algorithms — Lecture 2Erik Demaine on sequences and sets. Free MIT lecture videos covering exactly the array-vs-linked-list tradeoffs above.
- Semicolony simulatorQueues and stacksFour orderings — FIFO, LIFO, deque, priority — with the cell-and-pointer representation underneath.
- Semicolony guideMemory allocationHow malloc, jemalloc, tcmalloc, and the kernel slab allocator manage their free lists.