network · bgp · routing

Anycast Simulator: one IP, many cities.

Anycast is the trick that makes 1.1.1.1 work everywhere. Many machines in many cities advertise the same IP from different autonomous systems; the routers between client and PoP pick the closest one by BGP AS-path length; failover happens when a PoP withdraws its advertisement, in BGP-convergence time — seconds, not DNS-TTL minutes. It is the placement scheme behind Cloudflare 1.1.1.1, Google 8.8.8.8, AWS Route 53, the thirteen root DNS servers (A through M), and every modern CDN edge. The simulator below lets you push on the world; the long-form reference covers BGP path selection, Cloudflare's architecture, RFC 4786, and the operational pitfalls.

Active PoPs

8/8

Clients

Avg AS-path

2.5

Mode

BGP

Mode Shared IP Convergence (ms)

Click the map to place a client · click a PoP dot to fail / restore it

active PoP advertising 1.1.1.1 withdrawn (failed) client active route

Active PoPs8/8

Clients4

Served4

Avg AS-path2.5

ModeBGP

Last converge—

Client routes

c1 Comcast (AS7922) → New York 4 AS hops BGP

c2 Deutsche Telekom (AS3320) → Frankfurt 2 AS hops BGP

c3 KT (AS4766) → Tokyo 2 AS hops BGP

c4 ACT (AS24309) → Mumbai 2 AS hops BGP

Event log

client c4 (ACT (AS24309)) → Mumbai (2 AS hops, ~13ms RTT)

client c3 (KT (AS4766)) → Tokyo (2 AS hops, ~16ms RTT)

client c2 (Deutsche Telekom (AS3320)) → Frankfurt (2 AS hops, ~7ms RTT)

client c1 (Comcast (AS7922)) → New York (4 AS hops, ~54ms RTT)

What you're looking at

A world map where eight glowing dots are PoPs, each in a different city and autonomous system, all advertising the same IP, 1.1.1.1. The black dots are clients; the dashed lines show which PoP each one currently routes to, picked by AS-path length with RTT as the tiebreaker. The stat strip tracks active PoPs, average AS-path, and the last convergence time. Click anywhere on land to drop a new client; click a PoP to fail or restore it; the Mode dropdown swaps between anycast and GeoDNS.

Click a PoP that has clients pointed at it and watch them snap to the next-nearest origin in BGP-convergence time. Then switch Mode to GeoDNS and fail that same PoP again: the affected clients keep routing to the dead PoP until their DNS TTL counts down to zero. That gap is the whole argument for anycast. The surprise is how restoring a PoP doesn't always win its old clients back — routing only changes when a shorter AS-path actually appears.

What anycast actually is

Anycast is the practice of advertising the same IP address from many physical locations and letting the routing fabric pick one for each user. The trick is that IP routing is destination-based and AS-path-aware — a router looking at a packet for 1.1.1.1 consults its BGP table, sees that 1.1.1.1/32 (or more typically 1.1.1.0/24) has been announced by several autonomous systems, picks the announcement with the shortest AS path, and forwards the packet that direction. There's no central authority "choosing the PoP"; every BGP-speaking router along the way makes its own local decision, and the packet just falls toward the nearest origin.

The "shortest AS path" wins, with tiebreakers in this order: local preference (operator policy), AS-path length, origin type, MED (multi-exit discriminator from the neighbour), then iBGP vs eBGP, then IGP cost. AS-path length is what dominates global behaviour. The result, from the user's perspective: they type 1.1.1.1 into dig, and they hit the Cloudflare PoP closest to them in network distance — usually the same continent, often the same city.

Why anycast works at all

At first read it sounds like it shouldn't. Two machines with the same IP would seem to clash. The reason it doesn't: IP forwarding is destination-based and stateless at the router. A router does not need (or want) to know who advertised the destination — it just needs the next-hop. If many neighbours offer routes to the same destination, the router picks one and forwards. The packet eventually reaches some instance of the destination; that's all that's required.

The constraint is at the application layer. Stateless protocols — DNS queries, CDN GETs, HTTP API calls that don't depend on session affinity — are fine: the response goes back to the source IP, the source has no idea it talked to the Singapore copy vs the Tokyo copy, and nobody cares. Stateful protocols — long-running TCP, sticky web sessions, websockets — need the connection to stay pinned to one endpoint. In practice TCP-over-anycast works because BGP doesn't flap within a typical connection lifetime; but the design has to assume that flow stickiness is best-effort, not contractual.

Where anycast actually runs

Everywhere you'd think, and a few places you wouldn't. Root DNS is the canonical example — a.root-servers.net through m.root-servers.net are each anycast clouds of dozens to hundreds of physical servers; the thirteen letters are just thirteen IP addresses. The root server system handles a few hundred billion queries per day; anycast is how it scales without a single root datacenter melting. Cloudflare 1.1.1.1 and Google 8.8.8.8 are anycast public resolvers — same IP from hundreds of PoPs. AWS Route 53 nameservers and AWS Shield / Global Accelerator sit on anycast. Every modern CDN — Cloudflare, Fastly, Akamai, CloudFront — runs anycast edges; the IP you resolve for cdn.example.com is the same in every region. Even NTP pool, increasingly, uses anycast.

Convergence — how fast is failover

When a PoP fails, BGP withdraws the route. The withdrawal propagates through neighbouring ASes, each of which updates its own routing table and re-advertises. In a well-tuned global mesh, convergence is on the order of 5–60 seconds; in pathological cases (route flap damping, slow IBGP propagation, broken peering), it can stretch to minutes. The point: it's faster than DNS. There's no TTL to wait for; once your local router has updated its FIB, your next packet goes to a new PoP.

Real-world numbers: Cloudflare publishes that their internal failover (within their own AS) is sub-second using BGP add-path and BFD; cross-AS convergence sits in the 5–30s range depending on how aggressive your peers' BGP timers are. RIPE Atlas measurements of public anycast services show typical re-routing of 5–20 seconds globally after a withdrawal — orders of magnitude better than the 60–300 second DNS-TTL-bounded recovery of GeoDNS.

Anycast vs GeoDNS

	Anycast	GeoDNS
Layer	Network (BGP / IP)	Application (DNS)
Routing decision by	BGP path length per router hop	DNS resolver's source IP / EDNS Client Subnet
Failover speed	Seconds (BGP convergence)	TTL-bounded (often 60–300s + resolver caching)
Per-client routing accuracy	Determined by routing fabric, not always shortest geographically	Can be precise per-prefix with ECS, but resolvers may not propagate
Stateful traffic	Best-effort sticky during a flow	Strict — once DNS resolves, the client talks to one IP
Operations	Requires BGP peering at every PoP, /24 PI space	Just authoritative DNS; no peering needed
Capacity per PoP	Hard to control — traffic flows where BGP says	Easy to control — DNS can hand out specific IPs
Sweet spot	Stateless services at huge scale (DNS resolvers, CDN GETs)	Application-aware routing (premium-tier users → premium PoPs)

Most large operators use both, layered. The DNS hostname resolves via GeoDNS to a regional anycast IP, then BGP fans out to the nearest PoP within that region. Cloudflare does roughly this: regional anycast for the data plane, GeoDNS to bias resolver routing within each region.

Anycast and TCP — does the session survive?

A long-running TCP connection on an anycast IP is staying pinned to one instance because BGP did not change mid-flow. Verisign's engineers wrote about this in "The Anycast TCP Saga" — they ran TCP over an anycast DNS service and measured how often a path change broke a connection: very rarely. Internet routing is sticky in the short term; flaps happen at the seconds-to-minutes timescale, and typical TCP connections are short enough that the probability of a mid-flow flap is low.

Where it does break: very long-lived connections (websockets, gRPC streams, HTTP/2 sessions held open for minutes), or sessions whose path crosses an unstable peering link. The defence is either (a) treat anycast as best-effort and reconnect cleanly if the session drops, or (b) terminate TLS at the anycast edge, then use a unicast back-end IP for the actual long-lived session. Most CDNs do (b) — anycast edge does TLS, then the edge proxies to a unicast origin behind it.

Stateful services on anycast — the hard problem

A stateful service on anycast either coordinates externally or accepts non-determinism. If a user's write hits the Singapore PoP and their next read hits the Tokyo PoP, you need some mechanism to either get the write to Tokyo (async replication, multi-region database, conflict-free replicated data) or to send the read back to Singapore (session pinning at the application layer, e.g. a cookie that encodes the chosen PoP).

The cleanest pattern: stateless reads on anycast, writes through a single-leader region behind a separate hostname. Or: anycast everywhere, with a strongly-consistent datastore (Spanner, FoundationDB, Yugabyte) underneath that pretends to be one database. Anycast is not a database replication strategy; it's a request entry-point.

Operational pain — debugging anycast

"Where did my packet go?" is harder than it looks. traceroute from your laptop shows you the path to whichever instance the routing fabric picked for you right now; that may not be the path another user near you is taking. The standard tools — RIPE Atlas, NLNOG RING, Verfploeter, custom probe networks — exist precisely because you can't answer "which PoP is this user hitting?" without measuring from a probe near them.

Capacity sizing is per-PoP, but the routing decisions are not in your control. If a major peering link in São Paulo goes down, traffic that used to land in São Paulo suddenly hits Miami, and Miami's CPU graph spikes for reasons that have nothing to do with Miami. Every PoP needs headroom for the load it normally serves, plus the load it would inherit if the next-closest PoP failed. Monitoring is also per-PoP — there is no "global" dashboard that captures user experience; you need per-PoP latency, error rate, and capacity metrics.

Cloudflare 1.1.1.1 — how it actually works

Cloudflare announced 1.1.1.1 on April 1, 2018 in partnership with APNIC, which holds the 1.1.1.0/24 prefix. The same prefix is announced from every Cloudflare PoP — 300+ cities at last count — via Cloudflare's AS13335. A query from Frankfurt hits the Frankfurt PoP; a query from Singapore hits the Singapore PoP; a query from a small ISP without a Cloudflare peering nearby falls back to whichever transit AS happens to give the shortest path.

Cloudflare's published latency reports show median resolver latency under 15ms globally — that is only achievable through anycast at this PoP count. The service handles tens of billions of queries per day; the infrastructure is fundamentally a DNS daemon, a recursive resolver, and a BGP speaker, replicated identically at every PoP. There is no central state, no leader, no replication; each PoP is fully independent.

Anycast vs CDN — they're not the same layer

A CDN uses anycast at the IP layer (so the user reaches the nearest edge) plus application-layer routing on top (consistent hashing for cache keys, request steering to specific origins, geo-locked content). Anycast gets the packet to the right city; the CDN's edge proxy decides which back-end to fetch from.

This is why "use anycast" is not by itself a CDN strategy. Anycast is the entry point; the actual content delivery — cache hierarchy, TTL math, origin shielding — is software running at every PoP. Cloudflare, Fastly, CloudFront, Akamai all share this two-layer structure; the differences are in the application layer, not the anycast.

Common pitfalls

Long-lived connections during BGP flap. A websocket open for 20 minutes during an upstream peering reset breaks. Application code should reconnect on RST without surfacing the error to users.
Non-deterministic routing for border users. A user in Istanbul might bounce between the Frankfurt and Mumbai PoPs across requests if BGP path lengths are tied. Solve with explicit AS-path padding or selective de-aggregation if it matters.
Geo-compliance violations. If your contract says EU user data stays in the EU, you cannot rely on anycast — BGP doesn't know about GDPR. You need either GeoDNS / GeoIP filtering at the application layer, or separate anycast prefixes per region.
One PoP's load is not your traffic share. A peering change at someone else's AS hundreds of miles away can shift 30% of your traffic from one PoP to another overnight. Capacity planning has to assume each PoP can carry 1.5–2× its steady-state load.
Asymmetric routing. The packet goes one way through one PoP; the reply might be expected back through another (this rarely happens in practice for connection-oriented protocols but does for UDP). Sticky-aware load balancers in front of the anycast layer help.
Forgetting BGP advertisement on PoP restart. A new PoP that boots without re-announcing its prefix to the upstream router gets no traffic. Monitor the BGP session state, not just the application health.

Building it — what the BGP config looks like

Anycast doesn't require special protocols — just plain BGP with the same prefix announced from multiple ASes (or from one AS at multiple locations). A minimal sketch with two PoPs and the same /24:

# /etc/bird/bird.conf at PoP NYC
router id 198.51.100.1;
protocol direct {
  interface "lo";
}
protocol bgp upstream_a {
  local as 64500;
  neighbor 203.0.113.1 as 64600;
  ipv4 {
    export filter {
      if net = 192.0.2.0/24 then accept;
      reject;
    };
  };
}

# /etc/bird/bird.conf at PoP TOKYO
router id 198.51.100.2;
protocol direct {
  interface "lo";
}
protocol bgp upstream_b {
  local as 64500;
  neighbor 203.0.113.65 as 64601;
  ipv4 {
    export filter {
      if net = 192.0.2.0/24 then accept;
      reject;
    };
  };
}

Both PoPs advertise 192.0.2.0/24. The upstream routers see two paths; route selection on every router hop along the way decides which traffic lands where. Failover happens by withdrawing the announcement (set protocol bgp upstream_a disabled, or let BFD timeouts kill the session). For real production, you also want graceful restart, route aggregation policies, and BGP communities to influence upstream routing.

Anycast Simulator: one IP, many cities.

What anycast actually is

Why anycast works at all

Where anycast actually runs

Convergence — how fast is failover

Anycast vs GeoDNS

Anycast and TCP — does the session survive?

Stateful services on anycast — the hard problem

Operational pain — debugging anycast

Cloudflare 1.1.1.1 — how it actually works

Anycast vs CDN — they're not the same layer

Common pitfalls

Building it — what the BGP config looks like

Further reading