Multi-region.

Most cloud architectures aren't multi-region, and most of them don't need to be. Multi-AZ inside one region covers ~99% of failure modes for a small fraction of the cost and complexity. Multi-region exists for two reasons: you have users on more than one continent and want the latency win, or you have a regulatory or business need for resilience to a whole region going down. Pick which one applies before you draw the architecture.

1 · The two reasons you'd actually do it

Latency. Users in Europe shouldn't talk to a US-east origin for every page view. The round-trip is 100–120 ms even on the fastest fibre. Putting compute and data in a region near the user cuts that to single-digit ms within-region plus the routing decision.
Resilience to a whole-region outage. AWS us-east-1 has had a handful of well-known multi-hour incidents. If your business breaks when that happens, you need a story for serving from somewhere else. For most products this isn't worth the cost; for financial services, healthcare, and anything regulator-watched, it is.

What's not a good reason: vague "future-proofing." Multi-region is a 2–3× cost multiplier and a 3–5× operational multiplier. If your concrete use case is "we might want it someday," you almost certainly don't. Multi-AZ inside one region is the right starting point.

2 · Three multi-region shapes

Shape	RTO / RPO	Cost multiplier	When to pick it
Pilot light	RTO: hours · RPO: minutes	~1.2×	Cold DR region, data replicated async, compute scaled to zero. Cheapest. You're betting whole-region outages are rare enough to absorb a slow recovery.
Warm standby	RTO: minutes · RPO: seconds	~1.5×	Reduced-capacity DR region, ready to scale on failover. Reasonable middle ground for serious B2B.
Active-active	RTO: ~0 · RPO: seconds	~2–3×	Both (or more) regions serving traffic. Single-region failure is invisible to users. The model for consumer services at scale.

RTO = Recovery Time Objective. RPO = Recovery Point Objective (how much data you accept losing). Both numbers should be in the architecture doc, not in someone's head.

3 · The AWS canonical version

Layer	Service	What it does
Traffic routing	Route 53 (latency / weighted / failover)	Latency routing sends users to the closest region. Failover routing flips traffic on health-check failure. Weighted routing for blue/green.
Edge acceleration	Global Accelerator	Anycast IPs that route over AWS backbone to the nearest region. Lower latency variance than DNS-based routing.
Relational DB	Aurora Global Database	One writer region, ≤5 reader regions, <1 second cross-region lag. Promotes a reader on failover.
Relational DB (active-active)	Aurora DSQL (preview)	Distributed serverless SQL with active-active writes.
NoSQL	DynamoDB Global Tables	Multi-region multi-active. Last-writer-wins conflict resolution.
Object storage	S3 Cross-Region Replication	Async per-bucket replication to another region. Often paired with CloudFront so the user never knows which origin is alive.
Event bus	EventBridge cross-region	Replicate events from one bus to another region.
Caching	ElastiCache Global Datastore (Redis)	Cross-region Redis replication. Eventual consistency.
Network	Cloud WAN / Transit Gateway peering	Cross-region VPC connectivity.

4 · GCP and Azure equivalents

Concept	AWS	GCP	Azure
Traffic routing	Route 53 + Global Accelerator	Global LB (anycast)	Front Door / Traffic Manager
Globally strong SQL	Aurora DSQL (preview)	Spanner (mature)	Cosmos DB SQL API with strong
Multi-region NoSQL	DynamoDB Global Tables	Firestore / Bigtable replication	Cosmos DB multi-region writes
Object cross-region	S3 CRR	GCS dual-region / multi-region buckets	Blob Storage GRS / RA-GRS
Network	TGW peering / Cloud WAN	Global VPC (single VPC spans regions)	Virtual WAN

Spanner is the standout here. Globally linearisable SQL with single-digit-ms latency for in-region reads and ~100ms for cross-region writes. The only commercial managed DB in this tier. If your workload needs strong consistency across regions, this is one of the few times GCP is the obvious pick even in an AWS shop.

5 · Failover drills

The architecture isn't multi-region until you've failed over for real, at least once. Things that get caught in drills, in order of frequency:

Stale DNS at the edge. TTLs you set to 300 seconds get cached for 30 minutes by some resolver in the wild. Plan failover with the longest TTL anyone could be holding.
Region-specific config. Hardcoded ARNs, region-pinned bucket names, secrets that exist only in the active region. The drill exposes them; the runbook documents them.
Workload imbalance. The standby region is 50% capacity. Failover hits it at 100% traffic. It melts. Either size both regions for full load or shed traffic during failover.
Replication lag. Aurora Global lag spikes during failover. Reads from the promoted region miss the last few seconds of writes. Document the RPO honestly.
Cross-region IAM. Roles, KMS keys, secrets — all region-scoped by default. The "decrypt this in the DR region" step gets forgotten.

The drill cadence at most serious shops is quarterly. The first one is full of surprises; by the fourth or fifth, the runbook handles itself.

6 · Cost note

Cross-region data transfer. AWS charges $0.02/GB for cross-region replication. A 10 TB/day replication is ~$6K/month just in transfer.
Idle DR capacity. Pilot light is cheap; warm standby costs you 20–50% of primary; active-active doubles the compute bill.
Multi-region managed DB premiums. Aurora Global adds ~20% to RDS cost per replica region. DynamoDB Global Tables is 1× per region (so 3 regions = 3×). Spanner cost scales with node count and region count.

A reasonable model for a serious B2B product: active-active across 2 regions with sized-up DR; quarterly failover drills; total infra cost roughly 2–2.5× single-region. Worth it if the business case is real. Not worth it as a "good practice" without one.

Multi-region.

1 · The two reasons you'd actually do it

2 · Three multi-region shapes

3 · The AWS canonical version

4 · GCP and Azure equivalents

5 · Failover drills

6 · Cost note

Further reading

Observability →