Cloud engineering, the mental model.
Every cloud is the same five or six primitives wearing different names. Compute. Storage. Networking. Identity. A database tier. A way to see what's happening. Once you can put any AWS, GCP, or Azure service into one of those buckets, the rest is detail. This Codex walks the eight topics that matter — AWS-first, because that's what most readers ship on, with the GCP and Azure equivalents called out per topic. The aim isn't to list every service from memory; it's the shape you reach for when you're designing a system or staring at a bill that grew faster than the traffic did. Pair it with the system-design path and the scale-to-millions walkthrough.
The primitives, one page each.
Roughly 15–25 minutes per page. Read top to bottom for the full mental model, or jump to whichever primitive is currently on fire.
- 01
Compute
VMs, containers, serverless. When to pick EC2 over ECS over Lambda — and the GCP/Azure equivalents that map onto the same three buckets.
- 02
Storage
Object, block, file. S3 vs EBS vs EFS. Cost tiering, durability promises, and the corners where each one bites you.
- 03
Networking
VPCs, subnets, NAT, peering, transit. The mental model that survives any cloud — and the AWS-specific bits that take three months to internalise.
- 04
Identity & IAM
Roles, policies, federation, least privilege. The single security topic where small mistakes have the largest blast radius.
- 05
Managed databases
RDS, Aurora, DynamoDB, and the picking-a-database matrix. When managed Postgres is the right answer and when it isn't.
- 06
Multi-region
Route 53, Aurora Global, replication shapes, failover drills. The cost and complexity tax — and when it actually earns its keep.
- 07
Observability
Logs, metrics, traces. CloudWatch versus Datadog versus rolling your own — and what "observable" means when the bill is part of the design.
- 08
Cost engineering
Reserved instances, spot, savings plans, AZ-aware traffic. The line items that quietly run the bill — and the FinOps habits that keep them in check.
Every topic follows the same six beats.
So you can scan quickly when you already know the territory and slow down only where it's new.
- The premise
- What this primitive does in plain English. What problem it solves that wasn't already solved by something simpler.
- The AWS-canonical version
- The default services, configured the way most teams actually run them. Not exam-prep depth — the mental model.
- GCP and Azure equivalents
- A short table that maps the AWS names to the others. Useful when you switch clouds or argue with someone who has.
- How to pick
- A small decision tree for "which one do I reach for here." The interesting part is the trade-offs — cost, lock-in, ops burden.
- What breaks
- The corners where the primitive bites you. The bugs you'll learn the hard way unless someone tells you first.
- Cost note
- The line item that adds up — egress, cross-AZ traffic, idle instances, the surprise on the bill that took a meeting to explain.
GCP and Azure, same shape.
GCP Codex
The project model, the global VPC, Compute Engine, GKE, Cloud Run, Cloud Storage, BigQuery, Spanner, Pub/Sub. Same shape as the AWS codex, each page ending in a gcloud lab.
Azure · 9 deep divesAzure Codex
Tenants and resource groups, Entra ID, VMs, VNets, AKS, Functions, Blob Storage, Cosmos DB, Service Bus & Event Hubs. Same shape, each page ending in an az lab.
The figures that end up shaping the bill.
| What | Value |
|---|---|
| S3 standard durability | 11 nines (99.999999999%) |
| S3 standard availability SLA | 99.9% |
| EC2 / single-instance SLA | 99.5% |
| Multi-AZ deployment SLA | 99.99% |
| Cross-AZ data transfer | ~$0.01/GB each way |
| Internet egress (first tiers) | ~$0.09/GB |
| NAT Gateway processing | ~$0.045/GB + hourly |
| Spot vs on-demand discount | up to ~70–90% off |
| Reserved / savings plan discount | up to ~72% (3yr) |
| Lambda billing granularity | per 1 ms |
These are AWS-flavoured order-of-magnitude figures and they drift over time — check the pricing page before you put a number in a doc. Treat them as a sanity check on whether a design is in the right neighbourhood, not as quotes.
Five that cost real money or real sleep.
- Letting cross-AZ traffic become the architecture.Spreading services across availability zones is the right call for resilience, but every byte that crosses an AZ boundary is metered both ways. A chatty service mesh, a database read replica in another AZ, a load balancer fanning out blindly — each one is a line item that grows with traffic, not with footprint. Pin the chatty paths within an AZ and accept the cross-AZ cost only where it buys real failure isolation.
- Treating IAM as something to tighten later.Identity is the one topic where a small mistake has the largest blast radius. A wildcard policy attached "just to unblock the demo" tends to outlive the demo by years. Start from least privilege and widen on evidence — it is far cheaper than auditing your way back from a role that can do everything. Read the identity page before the others if security is on you.
- Reaching for multi-region before single-region is solid.Multi-region is the most expensive resilience you can buy, in dollars and in operational complexity. Replication lag, split-brain risk, failover drills nobody runs, and a bill that roughly doubles. Most outages are fixed by multi-AZ and a good runbook. Earn multi-region with a real availability requirement, not a hypothetical one.
- Assuming managed means hands-off.RDS still needs you to pick instance sizes, plan for failover, and watch storage autoscaling. A managed database hides the operating system, not the capacity math. The provider runs the daemon; you still own the schema, the connection pool, and the decision about when Postgres stops being the right answer.
- Designing for the cloud you read about, not the one you bill on.AWS, GCP, and Azure share a mental model but differ in the details that cost money — default MTUs, free-tier boundaries, which traffic is charged, how their IAM scoping nests. The textbook mental model gets you most of the way; the provider's pricing page and quota docs are the rest. Verify the cost assumptions on the cloud you are actually running.
Why AWS-first.
AWS still owns the largest share by a comfortable margin, the docs and community are the deepest, and most cloud roles assume reading AWS first. The patterns transfer cleanly — once you understand VPC, you understand Google's VPC. Once you understand IAM, Azure's RBAC reads the same way with different nouns. The clouds compete on price and on managed services, not on the underlying mental model.
What this Codex is not: a certification prep guide. The AWS SA, GCP PCA, and Azure AZ-104 each demand a different kind of memorisation. If that's what you need, follow the official study paths — those exams are well-served by their providers. This page is for the senior engineer who needs to design and run things, not list every cloud service from memory.
The paths that sit next to this one.
The cloud abstractions hide a lot of computer science. Each path below pulls a layer back.
What you build once the primitives are in hand — the growth story and the orchestration layer.
VPC makes more sense once you've seen TCP/IP; managed databases make sense once you've seen the algorithm.
The AWS deep dives where the day-to-day reflexes live, page by page.
Compute — VMs, containers, serverless →
The first decision in any cloud build: how is your code going to run? EC2, ECS, EKS, Lambda — the three buckets, the trade-offs, the GCP and Azure equivalents.
Read Compute