8 topics · AWS-first

Cloud Codex

Cloud engineering, the mental model.

Every cloud is the same five or six primitives wearing different names. Compute. Storage. Networking. Identity. A database tier. A way to see what's happening. Once you can put any AWS, GCP, or Azure service into one of those buckets, the rest is detail. This Codex walks the eight topics that matter — AWS-first, because that's what most readers ship on, with the GCP and Azure equivalents called out per topic. The aim isn't to list every service from memory; it's the shape you reach for when you're designing a system or staring at a bill that grew faster than the traffic did. Pair it with the system-design path and the scale-to-millions walkthrough.

The eight topics

The primitives, one page each.

Roughly 15–25 minutes per page. Read top to bottom for the full mental model, or jump to whichever primitive is currently on fire.

The page shape

Every topic follows the same six beats.

So you can scan quickly when you already know the territory and slow down only where it's new.

The premise: What this primitive does in plain English. What problem it solves that wasn't already solved by something simpler.
The AWS-canonical version: The default services, configured the way most teams actually run them. Not exam-prep depth — the mental model.
GCP and Azure equivalents: A short table that maps the AWS names to the others. Useful when you switch clouds or argue with someone who has.
How to pick: A small decision tree for "which one do I reach for here." The interesting part is the trade-offs — cost, lock-in, ops burden.
What breaks: The corners where the primitive bites you. The bugs you'll learn the hard way unless someone tells you first.
Cost note: The line item that adds up — egress, cross-AZ traffic, idle instances, the surprise on the bill that took a meeting to explain.

Per-cloud deep dives

AWS Codex — 16 service-shaped pages

The eight topics above are cloud-agnostic. When you need to actually ship on a specific cloud, the service-shaped deep dives are where the muscle memory lives — instance family naming, VPC endpoint reflexes, KMS envelope encryption, CloudFront, API Gateway, Step Functions, and the corners that take a quarter to internalise. Every page ends with a "build it yourself" CLI lab.

The other two clouds

GCP and Azure, same shape.

GCP · 9 deep dives

GCP Codex

The project model, the global VPC, Compute Engine, GKE, Cloud Run, Cloud Storage, BigQuery, Spanner, Pub/Sub. Same shape as the AWS codex, each page ending in a gcloud lab.

Azure · 9 deep dives

Azure Codex

Tenants and resource groups, Entra ID, VMs, VNets, AKS, Functions, Blob Storage, Cosmos DB, Service Bus & Event Hubs. Same shape, each page ending in an az lab.

Numbers worth knowing

The figures that end up shaping the bill.

What	Value
S3 standard durability	11 nines (99.999999999%)
S3 standard availability SLA	99.9%
EC2 / single-instance SLA	99.5%
Multi-AZ deployment SLA	99.99%
Cross-AZ data transfer	~$0.01/GB each way
Internet egress (first tiers)	~$0.09/GB
NAT Gateway processing	~$0.045/GB + hourly
Spot vs on-demand discount	up to ~70–90% off
Reserved / savings plan discount	up to ~72% (3yr)
Lambda billing granularity	per 1 ms

These are AWS-flavoured order-of-magnitude figures and they drift over time — check the pricing page before you put a number in a doc. Treat them as a sanity check on whether a design is in the right neighbourhood, not as quotes.

Common traps

Five that cost real money or real sleep.

Letting cross-AZ traffic become the architecture.

Spreading services across availability zones is the right call for resilience, but every byte that crosses an AZ boundary is metered both ways. A chatty service mesh, a database read replica in another AZ, a load balancer fanning out blindly — each one is a line item that grows with traffic, not with footprint. Pin the chatty paths within an AZ and accept the cross-AZ cost only where it buys real failure isolation.
Treating IAM as something to tighten later.

Identity is the one topic where a small mistake has the largest blast radius. A wildcard policy attached "just to unblock the demo" tends to outlive the demo by years. Start from least privilege and widen on evidence — it is far cheaper than auditing your way back from a role that can do everything. Read the identity page before the others if security is on you.
Reaching for multi-region before single-region is solid.

Multi-region is the most expensive resilience you can buy, in dollars and in operational complexity. Replication lag, split-brain risk, failover drills nobody runs, and a bill that roughly doubles. Most outages are fixed by multi-AZ and a good runbook. Earn multi-region with a real availability requirement, not a hypothetical one.
Assuming managed means hands-off.

RDS still needs you to pick instance sizes, plan for failover, and watch storage autoscaling. A managed database hides the operating system, not the capacity math. The provider runs the daemon; you still own the schema, the connection pool, and the decision about when Postgres stops being the right answer.
Designing for the cloud you read about, not the one you bill on.

AWS, GCP, and Azure share a mental model but differ in the details that cost money — default MTUs, free-tier boundaries, which traffic is charged, how their IAM scoping nests. The textbook mental model gets you most of the way; the provider's pricing page and quota docs are the rest. Verify the cost assumptions on the cloud you are actually running.

A note on framing

Why AWS-first.

AWS still owns the largest share by a comfortable margin, the docs and community are the deepest, and most cloud roles assume reading AWS first. The patterns transfer cleanly — once you understand VPC, you understand Google's VPC. Once you understand IAM, Azure's RBAC reads the same way with different nouns. The clouds compete on price and on managed services, not on the underlying mental model.

What this Codex is not: a certification prep guide. The AWS SA, GCP PCA, and Azure AZ-104 each demand a different kind of memorisation. If that's what you need, follow the official study paths — those exams are well-served by their providers. This page is for the senior engineer who needs to design and run things, not list every cloud service from memory.

Where this connects

The paths that sit next to this one.

The cloud abstractions hide a lot of computer science. Each path below pulls a layer back.

Above the primitives

Designing and running systems

What you build once the primitives are in hand — the growth story and the orchestration layer.

Beneath the abstractions

The layers the cloud hides

VPC makes more sense once you've seen TCP/IP; managed databases make sense once you've seen the algorithm.

Ship on a cloud

Service-shaped, hands-on

The AWS deep dives where the day-to-day reflexes live, page by page.

Start here

Compute — VMs, containers, serverless →

The first decision in any cloud build: how is your code going to run? EC2, ECS, EKS, Lambda — the three buckets, the trade-offs, the GCP and Azure equivalents.

Read Compute

Cloud engineering, the mental model.

The primitives, one page each.

Compute

Storage

Networking

Identity & IAM

Managed databases

Multi-region

Observability

Cost engineering