10 min read · Guide · Network · Routing
How it works · Network · Inter-domain routing

BGP, how the internet learns where to send a packet.

The internet is 75 000 networks that don't trust each other. Border Gateway Protocol is the polite, paranoid handshake that keeps a packet from Sydney finding its way to Stockholm.

Parts01–08 InteractiveAS-path picker PrereqIP / CIDR

What is BGP?

Autonomous systems talking to autonomous systems.

BGP (Border Gateway Protocol) is the routing protocol that glues the internet together — every autonomous system (AS) uses BGP to tell its neighbours which IP prefixes it can reach. Standardised in RFC 4271 (2006, originally 1989), BGP is also famously fragile; the Facebook 2021 outage, Google's 2019 leak, and Cloudflare's 2020 incident were all BGP-driven.

The public internet is divided into Autonomous Systems — about 75,000 of them. An AS is one organisation's network with its own routing policy: an ISP, a cloud provider, a university, a Tier-1 carrier. Inside an AS, anything goes — OSPF, IS-IS, static routes, mesh. Between ASes, exactly one protocol exists: BGP, defined in RFC 4271.

BGP's job is letting each AS announce which IP prefixes it can reach, and learning which prefixes its neighbours can reach. Each router builds a routing table by combining what its neighbours tell it, applying a long list of policy preferences, and picking a single best path per destination prefix. Then it forwards traffic that way until something changes.


How BGP finds a path through the AS graph

Each network only knows its direct neighbours.

Below: a tiny internet — three Tier-1 carriers, two cloud providers, three end-user ISPs. Pick an origin AS (where the prefix lives) and a viewer AS (where the lookup is being done). The path BGP would settle on is highlighted.

Origin (announcing the prefix):
Viewer (receiving):
Best AS-path: AS65003 → AS13335 → AS1299 → AS16509 → AS65001 5 hops
AS174tier 1Cogent AS1299tier 1Telia AS3356tier 1Lumen AS16509tier 2AWS AS13335tier 2Cloudflare AS65001tier 3ISP-A AS65002tier 3ISP-B AS65003tier 3ISP-C
viewer origin on path

How AS_PATH records the route a packet takes

Each network prepends itself to the list.

Each BGP announcement carries an AS_PATH: the sequence of ASes the announcement has crossed. When AS-X receives a prefix announcement from AS-Y, it adds Y to the front of the path before forwarding. The first ASN is the origin; each subsequent ASN is one hop closer to the receiver.

Two rules fall out. Loop detection: if your own ASN appears in a received path, drop it — you'd be creating a cycle. Path length: shorter paths beat longer ones in the BGP best-path tiebreak. The AS_PATH is both the routing data and the routing distance.


How BGP picks the best path

A dozen tiebreakers, evaluated in order.

For any prefix, a router may know multiple paths. BGP runs a deterministic tiebreak — about a dozen criteria, evaluated in order until one path wins. The first few are the only ones that matter day to day:

  1. 1

    Highest local preference.

    An operator-set value — "I prefer routes that go through my paid transit provider over the cheap free peer." Set per neighbour.

  2. 2

    Shortest AS_PATH.

    If local pref is equal, the path through fewer ASes wins. This is the algorithm Part 03's simulator approximates.

  3. 3

    Lowest origin code.

    IGP < EGP < Incomplete. Mostly historical; rarely decides today.

  4. 4

    Lowest MED.

    Multi-Exit Discriminator — a hint from a neighbour AS about which entry point to prefer when multiple links connect them.


BGP security: filter every route you accept

One bad policy can leak globally in minutes.

The selection above is the protocol mechanic. The actual route a router uses is filtered through layers of policy first. Each AS has policy on what it will announce and what it will accept: customers vs peers vs transit, prefix-list filters, RPKI validation, max-prefix limits, dampening for flapping routes.

This is where BGP gets dangerous. A single misconfigured policy can leak a route incorrectly — and because every neighbour propagates what they hear, one mistake becomes a global incident in minutes. Pakistan's 2008 YouTube hijack, AS7007 in 1997, the 2021 Facebook self-disconnection — all policy-failure stories.


Anycast: one IP announced from many places

One IP, many origins.

BGP doesn't care if a prefix is announced by exactly one AS — multiple ASes can announce the same prefix. Routing converges to the topologically nearest one for each receiver. This is anycast: the foundation of every modern CDN and DNS network.

1.1.1.1, 8.8.8.8, every CDN edge IP — all anycasted. A user in Sydney and a user in Stockholm both type the same hostname, get the same IP, and end up at completely different machines a few milliseconds away — anycast acts as a coarse load balancer at the network layer. See the CDN guide.


RPKI: verifying who actually owns a prefix

The fix for a protocol that trusts everyone.

BGP was designed in 1989 in a small, trusting world. Anyone can announce any prefix; nothing in the protocol verifies you actually own it. RPKI (Resource Public Key Infrastructure) is the modern fix — a chain-of-trust system where prefix holders publish signed authorisations; routers validate received announcements against the published authorisation.

Adoption stalled for years; it's now widespread among major networks. The MANRS initiative pushes the rest. BGP without RPKI is a system that works because most operators are competent — not because the protocol is.


When BGP took the internet down

Three outages worth learning from.

2008 · YT

YouTube hijack.

Pakistan Telecom announced a more-specific prefix for YouTube to block it inside Pakistan; the announcement leaked globally; YouTube was inaccessible worldwide for ~2 hours.

2021 · FB

Facebook self-disconnect.

A maintenance script withdrew Facebook's BGP announcements. The internet had no route to facebook.com for 6 hours. Their own DNS was unreachable, locking out engineers.

2008 · 7007

The first famous route leak

In 1997 AS7007 leaked the entire internet routing table back to its peers. Most of the global internet stopped working for hours. The textbook case.

How cloud providers and CDNs run BGP

Where the protocol actually lives.

Cloudflare
~330 PoPs, anycast on the same IP across all of them. ~1 trillion requests/day handled by BGP-driven routing. Custom BGP daemon (BIRD-derived); peers with thousands of ASes globally.
AWS Direct Connect / Google Cloud Interconnect
Private BGP sessions between customer routers and cloud edge — lets you announce on-prem routes into your cloud VPC and vice versa. The standard pattern for hybrid-cloud connectivity.
Datacenter fabric
Modern hyperscale datacenters (Meta, Google, Microsoft) run BGP inside the datacenter as well — every Top-of-Rack switch is a tiny AS, BGP exchanges routes between them. Replaces older OSPF/IS-IS designs. Meta's published Fabric Aggregator design is the canonical reference.
Looking glasses
Public BGP visibility via projects like RIPE's RIS, RouteViews, and BGPlay. When something goes wrong on the global internet (route leak, DDoS, hijack), these are the tools researchers use to forensic the path. Cloudflare publishes its own at radar.cloudflare.com.

The 2021 Facebook outage (Oct 4, 2021) is the canonical BGP cautionary tale. Facebook's edge routers withdrew the prefixes for facebook.com, instagram.com, and whatsapp.com from BGP after a misconfigured maintenance command — taking the whole property offline globally for ~6 hours. The recovery was complicated by the fact that Facebook's internal tools and even the badge readers in the office depended on the same DNS/BGP infrastructure. The published postmortem became a teaching example for change management.



A closing note

BGP holds the public internet together with a 1989 protocol and a globally-distributed configuration database that no one centrally validates. It mostly works. The fact that it works is one of the more remarkable engineering achievements still running. The fact that it occasionally fails spectacularly — taking out TLS handshakes and cache refills alike — is the cost.

Found this useful?