How does Dijkstra's algorithm work?

Dijkstra maintains a tentative distance to every vertex (∞ initially, except the source at 0) and a set of "settled" vertices. Repeatedly: pick the unsettled vertex with the smallest tentative distance; settle it; relax all its outgoing edges (update neighbour distances if the path through this vertex is shorter). With a binary heap as the priority queue, the complexity is O((V + E) log V).

What is the difference between Dijkstra and A*?

A* is Dijkstra plus a heuristic estimate of remaining distance to the goal. The heuristic guides the search toward the goal instead of expanding uniformly outward. With an admissible heuristic (one that never overestimates), A* finds the same optimal path but typically expands far fewer nodes. Dijkstra is the right choice without a goal (single-source-all-targets); A* is the right choice with a goal and a useful heuristic.

Dijkstra's Algorithm Visualizer: shortest path, expanding outward.

Dijkstra's algorithm finds the shortest path from a start node by always expanding the closest unvisited node first, which guarantees the path it commits to is optimal for non-negative edge weights. Drop walls, drag the endpoints, and run it. Watch the wavefront fill the grid until it reaches the goal.

Visited

Path

Click mode

Run

Grid

What you're looking at

The board is a grid with a green start cell, a purple end cell, and black walls you paint by clicking and dragging. The click-mode buttons switch between dropping walls and moving either endpoint. Hit Dijkstra and the algorithm floods outward from the start: the tinted cells are everything it had to visit, and the solid line through them is the shortest route it found once the end popped off its queue. The counters up top track how many cells were visited versus how many sit on the final path.

Run it on the preset walls first and watch the visited region spread as a rough disc, not a beeline. That is the point Dijkstra makes: with no idea where the goal is, it settles cells in cheapest-first order and expands evenly in every direction. The number to watch is Visited against Path. The path is short, but the algorithm touched far more cells than it kept, because it has no heuristic pulling it toward the target. Box the end cell in with walls and the flood fills almost the whole grid before reporting no route, which is exactly the wasted work A* exists to cut down.

What is Dijkstra's algorithm?

Cheapest route, no map.

Dijkstra's algorithm finds the shortest path from a source vertex to every other vertex in a weighted graph with non-negative edges. Edsger Dijkstra sketched it in 1956 "in about twenty minutes" while waiting for a tram in Amsterdam; he published it in 1959. The algorithm powers OSPF and IS-IS internet routing, GPS turn-by-turn directions, and every shortest-path engine since.

Imagine you're standing at one end of a city and need to reach the other. You have a map of the streets and you know how long each one takes — five minutes here, twelve there, two on a tiny side street. You want the route that gets you home fastest. Your first instinct is to pick the most direct-looking line and walk it. Sometimes that works. On a grid, sometimes the apparently scenic detour is faster because the “direct” route runs through six traffic lights and a school zone.

Try the same thing on a graph with a hundred nodes and a thousand edges. The “most direct” instinct is useless because there is no geometry — only a list of roads with costs. Now imagine a million nodes (every road intersection in a metropolitan area), or a hundred million (Europe). The brute-force approach — try every possible path and pick the cheapest — explodes combinatorially. A graph with a million nodes has on the order of 10^1,000,000 possible paths. Even a fast computer can't enumerate them.

The trick the algorithm uses is a different invariant. Start at the source with a tentative cost of zero. Every other node starts at infinity — we don't yet know how to reach it. Now repeatedly do one thing: pick the unvisited node with the smallest tentative cost (always the source on the first step), look at its neighbours, and update each neighbour's cost if going through this node is cheaper than whatever we knew before. Mark the picked node visited and never touch it again.

Why does that work? The key insight: when you pick the smallest-cost unvisited node, no future path can ever undercut it, because every alternative would have to pass through some other unvisited node whose cost is already at least as large. So the cost we have for the picked node is final. Repeat the process and you fan out from the source in expanding waves of equal cost — what's called a “frontier” — until you settle every reachable node, or until the destination pops out of the queue.

This is Dijkstra's algorithm. On a graph with V nodes and E edges it runs in O((V + E) log V) with a binary heap — about thirty million operations for a million-node graph, fast enough to finish in a fraction of a second on modern hardware. On the European road graph (200 million nodes, 250 million edges) the naive version takes 30–120 seconds; with the right preprocessing tricks Google Maps gets it under 30 ms. Edsger Dijkstra designed it in 1956 in twenty minutes while waiting for coffee, and the rest is history we'll cover in Part 03.

How Dijkstra's algorithm works — settle smallest, never look back

Settle smallest, never look back.

The data structure at the heart of Dijkstra is a priority queue: a collection of items, each with a number, that always lets you pop the one with the smallest number first. Implementations differ in their constants — a binary heap pops in O(log n), a Fibonacci heap in amortised O(1) — but the contract is the same. Push items in any order; they come out in priority order. Most of Dijkstra's running time is just push, pop, push, pop.

Each iteration does three things. First, pop the unvisited node with the smallest tentative cost. Second, mark it visited (its cost is now final). Third, look at every outgoing edge: for each neighbour, compute “cost to me + edge weight”, and if that's less than the neighbour's current tentative cost, update it and push the neighbour back into the queue with the new cost. This last step is called relaxation: each edge is “relaxed” once per direction over the course of the algorithm, hence the O(E) term in the complexity.

Two practical refinements: keep a predecessor map recording which neighbour gave each node its current best cost — at the end you reconstruct the actual path by walking from target to source, predecessor by predecessor, and reversing. And whenever you pop a node from the queue, double-check its current tentative cost: queue entries can become stale when a node's cost is lowered after the original push. The pattern is “pop, check if cost matches, if not skip; otherwise process.” This gives lazy decrease-key without needing the heap to support the operation natively.

The simulator above runs this loop on a 20×20 grid. Drag walls in the canvas to create obstacles. Watch the visited region (highlighted) grow as concentric rings of equal cost from the start. The path itself, once the target is reached, is a single line back through the predecessor chain. The algorithm pays no special attention to the goal — it just expands outward in cheapest-first order. That's why A* (Part 04) outperforms Dijkstra when you know the goal's location: A* biases the queue order toward the goal and ignores irrelevant directions.

One subtle property: the algorithm assumes non-negative edge weights. A cost can be zero, but never negative. With a negative edge, Dijkstra can settle a node prematurely — its tentative cost is the smallest available now, but a longer path through a not-yet-settled node could go negative and undercut. Production code that allows arbitrary user-supplied weights needs an explicit guard at the input. We'll meet the right algorithm for negative weights — Bellman–Ford — in Part 05.

Origins — Edsger Dijkstra, 1956 (in twenty minutes)

Edsger Dijkstra, in twenty minutes.

Dijkstra famously designed his shortest-path algorithm in 1956, in his head, while sitting in a café in Amsterdam waiting for a coffee. He needed a demo for the new ARMAC computer at the dedication of the Mathematical Centre's new building, and a network-of-cities shortest-path problem was both easy to explain to a non-technical audience and easy to draw on a blackboard. The whole design took roughly twenty minutes; he was, in his words, “using the back of an envelope as a working surface.”

He published the algorithm three years later in Numerische Mathematik, volume 1, issue 1, page 269, under the title “A Note on Two Problems in Connexion with Graphs.” The paper is three pages long. There is no formal complexity analysis — computer-science complexity theory was still a decade away — no pseudocode in any modern sense, and no proof of correctness in the modern style. Dijkstra simply describes the procedure and says it works. By 1972 he had won a Turing Award for, among other things, this algorithm.

The core insight: in a graph with non-negative edge weights, the shortest path to any node v cannot pass through a node u whose tentative distance is greater than v's. So once we settle on the smallest unvisited node, its distance is final and no future relaxation can shorten it. The whole algorithm reduces to repeating “extract the smallest unvisited node, mark it settled, relax its outgoing edges” until the goal pops or the priority queue empties. Everything else is bookkeeping — the priority queue choice, the predecessor map, the termination condition.

Dijkstra was unsentimental about his own work. He often pointed out that Bellman had published a related algorithm one year earlier (Bellman, 1958, “On a Routing Problem,” Quarterly of Applied Mathematics) and that Floyd's all-pairs algorithm followed in 1962 (Floyd, “Algorithm 97: Shortest Path,” Communications of the ACM). The 1959 paper's lasting fame, he said, was a matter of luck and clarity rather than priority. The algorithm's name stuck because it was small, fast, and provably optimal under reasonable assumptions — the rare thing that survives both undergraduate textbooks and production code paths unchanged for sixty-six years.

One detail of the 1959 paper that modern readers often miss: Dijkstra used a linear scan of the unvisited array, not a heap. Heaps had been described by Williams in 1964 — five years after the algorithm — and the priority-queue formulation we now teach is therefore retroactive. The original O(V²) form is closer to what Dijkstra had in mind, and it remains the right shape for dense graphs where the linear scan is cache-friendly. The lesson, as so often in algorithms history, is that the mathematical idea is older and simpler than the optimised implementation; the heap is a footnote on top of a clean kernel.

Complexity — from V² to (E + V) log V

From V-squared to E plus V log V.

The naive Dijkstra runs in O(V²). At each of V iterations it linearly scans the unvisited array to find the minimum — a perfectly reasonable choice for dense graphs where E approaches V² anyway. For sparse graphs (E in the order of V), the linear scan dominates and a priority queue brings the cost down to O((V + E) log V) with a binary heap, then to O(V log V + E) with a Fibonacci heap.

The Fibonacci heap, introduced by Fredman and Tarjan in 1987 (“Fibonacci heaps and their uses in improved network optimization algorithms,” Journal of the ACM), supports decrease-key in amortised O(1) rather than the binary heap's O(log V). For Dijkstra, decrease-key dominates the inner loop, so the asymptotic improvement is real. In practice, however, Fibonacci heaps lose to binary heaps and pairing heaps because of cache-unfriendly pointer chasing and high constant factors. Mikkel Thorup pushed the bound further in 1999 with a linear-time SSSP algorithm for undirected graphs with positive integer weights, but the algorithm is essentially never implemented — the constants are catastrophic.

Concrete numbers help. On a 1 million-node, 10 million-edge sparse graph (typical road-network shape), a binary-heap Dijkstra implementation in C++ runs in roughly 1 second on a single modern core. Switching to a 4-ary heap (where every node has four children, doubling cache-line utilisation) shaves another 20–30%. Switching to a Fibonacci heap typically increases wall time by 30–100% because of pointer indirection. The lesson: asymptotic complexity tells you the shape of the curve, but cache behaviour decides whose curve is lower in absolute terms.

For very sparse graphs (E in the order of V) and small integer weights, bucket-based Dijkstra (Dial's algorithm, 1969) runs in roughly linear time using an array of buckets indexed by tentative distance. It's the algorithm BGP-style routers use internally for their tiny topologies, where weights are 8-bit administrative metrics. For dense graphs (E in the order of V²), the V² naive version actually wins because the linear scan visits cache-coherent contiguous memory. Heap-based Dijkstra wins the middle.

# Dijkstra's algorithm with a binary-heap priority queue.
dist[v] = ∞ for v in V; dist[s] = 0
prev[v] = null for v in V
pq = MinHeap();  pq.push((0, s))
while not pq.empty():
    (d, u) = pq.pop()
    if d &gt; dist[u]: continue   # stale entry
    for (v, w) in adj[u]:        # relax outgoing edges
        if dist[u] + w &lt; dist[v]:
            dist[v] = dist[u] + w
            prev[v] = u
            pq.push((dist[v], v))

A* — Dijkstra with a heuristic hint

Dijkstra with a hint.

Dijkstra explores in concentric circles of equal cost — on a uniform grid, the visited region forms a literal disc that grows outward from the source. If you have a goal location and any reasonable estimate of how far it is from the current node, why not bias exploration toward the goal instead of expanding uniformly? That observation is A*, published by Peter Hart, Nils Nilsson, and Bertram Raphael in 1968 (“A Formal Basis for the Heuristic Determination of Minimum Cost Paths,” IEEE Transactions on Systems Science and Cybernetics), as part of the Stanford SHAKEY robot project.

A* is structurally identical to Dijkstra: a priority queue ordered by cost. The only difference is which cost. Dijkstra orders by g(n) — cumulative path cost from the source to n. A* orders by f(n) = g(n) + h(n) — cumulative cost plus a heuristic estimate of remaining cost. With h ≡ 0 the algorithm is exactly Dijkstra. With h equal to true remaining cost, A* expands no nodes outside the optimal path.

For A* to remain optimal, the heuristic must be admissible: never overestimate true remaining cost. For each-node-expanded-at-most-once behaviour, it must additionally be consistent (also called monotonic): for every edge (n, n') of cost c, h(n) ≤ c + h(n'). On a 4-connected grid, Manhattan distance is both admissible and consistent. On a Euclidean plane, straight-line distance is both. On a road network with speed limits, “straight-line distance divided by max speed” is admissible. The harder the heuristic estimates — closer to true remaining cost — the fewer nodes A* expands, and the faster it terminates.

A* is the canonical algorithm for game pathfinding. StarCraft II, Civilization, every Bethesda RPG, and most grid-based puzzlers ship some flavour of A* with optimisations like jump-point search (Harabor and Grastien, 2011) for uniform-cost grids. Sokoban and similar puzzles use IDA* (iterative-deepening A*) when memory budgets are tight. The pattern is forty years old and shows no sign of leaving.

One bias-free way to compare Dijkstra and A*: count expanded nodes. On a 1000×1000 uniform grid with no obstacles, Dijkstra expands all 1 million cells before terminating; A* with Manhattan-distance heuristic expands a corridor along the optimal path, typically 2,000–5,000 cells. With obstacles the corridor widens but rarely beyond a few percent of the total grid. The constant-factor cost per expansion is identical for the two algorithms; the win is purely in node count, which is to say in priority-queue operations. On road graphs with realistic obstacles, A* with a tight heuristic can be 10–100× faster than Dijkstra in absolute wall time.

When Dijkstra is wrong — negative edges, Bellman-Ford, Johnson

When Dijkstra is wrong.

The “settle smallest first” invariant assumes a settled node's distance is final. With negative-weight edges this assumption fails: a longer path through a not-yet-settled node can undercut a settled distance. Dijkstra produces silently wrong answers; no exception is raised. The classic test case is a triangle s → a (cost 5), s → b (cost 4), b → a (cost −3) — the true shortest path s → b → a costs 1, but Dijkstra settles a at cost 5 first and never updates.

The right algorithm for negative weights is Bellman–Ford, published by Richard Bellman in 1958 (“On a Routing Problem,” Quarterly of Applied Mathematics) and independently described by Lester Ford in 1956. It runs in O(V·E), relaxing every edge V−1 times. The Vth pass detects negative cycles: if any edge still relaxes, there is a negative cycle reachable from the source and no shortest path exists. Bellman–Ford is the basis of distance-vector routing protocols — RIP (RFC 2453), EIGRP, and parts of BGP's decision process — and of currency-arbitrage detection in finance, where negative cycles correspond to profitable round-trips.

Floyd–Warshall (Robert Floyd, 1962, “Algorithm 97: Shortest Path,” CACM, building on Roy 1959 and Warshall 1962) computes all-pairs shortest paths in O(V³) with handling for negative edges (but not negative cycles). It's the right tool for a dense graph of a few hundred nodes where you want every pairwise distance — metro topology planning, small social-network analysis. For sparse graphs with negatives, Johnson's algorithm (Donald Johnson, 1977) re-weights edges via a single Bellman–Ford pass to make them non-negative, then runs Dijkstra from each source: O(V² log V + V·E), faster than Floyd–Warshall on sparse inputs.

One more cousin: SPFA (Shortest Path Faster Algorithm), an empirical optimisation of Bellman–Ford that uses a queue of nodes whose distance just decreased. It runs in expected O(k·E) for small k on most graphs but degrades to Bellman–Ford's worst case on adversarial inputs. SPFA shows up in competitive programming and is occasionally the best practical choice for sparse graphs with negatives, despite its weaker formal guarantees.

Algorithm	Best for	Complexity	Negatives?
Dijkstra (heap)	SSSP, non-negative weights	O((V+E) log V)	no
Bellman–Ford	SSSP, detect negative cycles	O(V·E)	yes
Floyd–Warshall	all-pairs, dense graphs	O(V³)	yes (no −cycle)
A*	point-to-point with heuristic	O(b^d) practical	no
Bidirectional Dijkstra	point-to-point, symmetric	~half explored area	no
Contraction hierarchies	continental routing (CH/CRP)	~0.1ms query, hours preprocess	no

Real-world graphs — a planet of edges and corners

A planet of edges and corners.

OpenStreetMap's full road graph for Europe contains roughly 200 million nodes and 250 million edges. Naive Dijkstra from Lisbon to Helsinki on that graph takes 30–120 seconds on a single core, expanding tens of millions of nodes in the radial wave. Bidirectional Dijkstra — running two simultaneous searches, one forward from the source, one backward from the target, and meeting in the middle — cuts the explored area roughly in half: the explored region is two interlocking discs each half the radius, so the total area is one-half what one disc of full radius would cover.

For continental-scale routing nobody actually runs naive Dijkstra. The state of the art is contraction hierarchies, developed by Robert Geisberger, Peter Sanders, Dominik Schultes, and Daniel Delling at Karlsruhe in 2008 (“Contraction Hierarchies: Faster and Simpler Hierarchical Routing in Road Networks,” WEA 2008). Pre-process the graph by ranking nodes by “importance,” then add shortcut edges that bypass less-important nodes. Query time becomes a tiny bidirectional Dijkstra on the contracted graph: a continent-spanning route in 0.1–1 ms, after a one-time preprocessing of a few hours.

OSRM, the Open Source Routing Machine used by many mapping services, ships contraction hierarchies as one of its two backends. GraphHopper uses CH plus extensions. Microsoft's Bing Maps used a related technique called highway hierarchies (Sanders and Schultes, 2005) before migrating to CH. Google Maps uses an internally-developed system called CRP (Customizable Route Planning, Delling et al., 2017) that handles dynamic edge weights — live traffic — without rebuilding the hierarchy. The 2017 paper from Google describes 30 ms world-wide route queries on real-time-updated graphs.

Other techniques: transit node routing (Bast, Funke, Sanders, 2007) precomputes paths between a small set of “access nodes” per region; long routes look up source-to-access, access-to-access, and access-to-target. ALT (A* with Landmarks and Triangle inequality, Goldberg and Harrelson, 2005) places a few dozen landmark nodes, precomputes their distances to every node, and uses landmark distances as a tight A* heuristic. The algorithms together brought continental routing from minutes in 2000 to milliseconds today — a factor of 100,000.

Where Dijkstra runs in production — OSPF, IS-IS, OSRM

The Internet runs on shortest paths.

Every IP packet that crosses your network is steered by a routing protocol, and the dominant interior protocols — OSPF and IS-IS — both run Dijkstra on link-state graphs. OSPF (Open Shortest Path First, RFC 2328 from John Moy, 1998, with later revisions for IPv6 in RFC 5340) builds a link-state database by flooding link advertisements within an area, then each router independently runs Dijkstra to compute a shortest-path tree rooted at itself. Routes are chosen by the topology of advertised costs — a metric that operators set per-link, often inversely proportional to bandwidth.

IS-IS (Intermediate System to Intermediate System, RFC 1195 from 1990, originally an OSI protocol) runs structurally the same algorithm but uses a different packet format and different area structure. Major Internet backbones (NTT, Level 3, Telia, much of Comcast) run IS-IS rather than OSPF for historical and operational reasons; smaller enterprises overwhelmingly use OSPF. The choice of protocol is rarely the bottleneck.

For inter-autonomous-system routing, BGP (Border Gateway Protocol, RFC 4271) does not run Dijkstra on edge weights directly. BGP is a path-vector protocol: it advertises full paths through autonomous systems, applies policy filters (local preference, AS-path length, MED, IGP cost), and picks the “best” path according to a deterministic decision process. The IGP cost step inside BGP's tiebreaker uses the underlying OSPF/IS-IS Dijkstra cost — so BGP transitively depends on Dijkstra even when its top-level decision logic does not.

Inside the IP control plane, route convergence after a link failure is a Dijkstra recomputation: the link state changes, every router re-floods, every router re-runs Dijkstra. Convergence times measured in production networks are typically 50–200 ms for OSPF, 100–500 ms for IS-IS, and 30–180 seconds for BGP — the gap between IGP convergence and BGP convergence is often the dominant component of customer-visible failover time.

The Spanning Tree Protocol (Radia Perlman, 1985, formalised as IEEE 802.1D) is a simpler shortest-path-tree algorithm operating at Layer 2 between Ethernet bridges. It chooses a root and prunes redundant links to break loops. Modern data-centre fabrics replace STP with TRILL or shortest-path bridging (IEEE 802.1aq), both of which run real Dijkstra-style algorithms to use multiple paths in parallel. The Internet's lowest layer, like its highest, is shortest-path underneath, all the way down.

Where Dijkstra quietly misbehaves — ties, floating-point, scale

Where Dijkstra quietly misbehaves.

Negative weights are the textbook hazard, but the failure mode is silent: no error, no warning, just an answer that is wrong. Production codebases that allow user-supplied weights (gaming engines, scientific simulators) almost always need an explicit guard at the input boundary — reject any negative weight, or detect and route to Bellman–Ford. The number of subtle “pathfinding gives wrong answer occasionally” bugs traceable to a negative weight is unmeasurable but large.

Floating-point edge weights introduce a more subtle hazard: the equality check in “is this node already visited” can fail under accumulated rounding. Two paths whose true total cost differs by 1e−15 may settle in either order depending on summation sequence; the chosen predecessor map then depends on hash iteration order or thread scheduling, making the algorithm non-deterministic. Production graph libraries usually convert weights to fixed-point integers internally — meters times 1000, milliseconds, or whatever the natural unit is — specifically to avoid this.

Disconnected graphs: Dijkstra terminates with infinity-distance for nodes unreachable from the source. The check “dist[target] == infinity” is the standard way to detect “no path exists.” Production code occasionally forgets this and reconstructs an empty predecessor chain, returning an empty path that callers mistake for “already at target.” The fix is one if-statement; the bug is amazingly common.

Multi-source Dijkstra — pushing many starting nodes onto the priority queue at distance zero before the main loop — is a useful generalisation that nobody teaches. It computes, for every node, the distance to the nearest source. Used in nearest-hospital problems, fire-spread simulations, and the Voronoi-diagram-like partitioning of map regions to nearest service centres. The implementation is a one-line change to the initialisation; the conceptual leap is significant.

Finally, when not to use Dijkstra at all: when you only need some path rather than the shortest, BFS works in O(V + E) with no priority queue and no edge weights. When you need many shortest paths from many sources, run all-pairs (Floyd–Warshall or Johnson's). When the graph is huge but the sources and sinks are few, contraction hierarchies obliterate naive Dijkstra. When weights are tiny integers, bucket-based Dial's algorithm wins. Dijkstra is the default; it is rarely the optimal choice.

A silent bug

Dijkstra produces wrong answers on graphs with negative-weight edges and never raises an error. If user input can supply weights, validate them at the boundary or fall back to Bellman–Ford. The number of production pathfinding bugs traceable to a stray negative weight is unmeasurable but large.