8 topics · AWS-first
Cloud Codex

Cloud engineering, the mental model.

Every cloud is the same five or six primitives wearing different names. Compute. Storage. Networking. Identity. A database tier. A way to see what's happening. Once you can put any AWS, GCP, or Azure service into one of those buckets, the rest is detail. This Codex walks the eight topics that matter — AWS-first, because that's what most readers ship on, with the GCP and Azure equivalents called out per topic. The aim isn't to list every service from memory; it's the shape you reach for when you're designing a system or staring at a bill that grew faster than the traffic did. Pair it with the system-design path and the scale-to-millions walkthrough.


The eight topics

The primitives, one page each.

Roughly 15–25 minutes per page. Read top to bottom for the full mental model, or jump to whichever primitive is currently on fire.

  1. 01

    Compute

    VMs, containers, serverless. When to pick EC2 over ECS over Lambda — and the GCP/Azure equivalents that map onto the same three buckets.

  2. 02

    Storage

    Object, block, file. S3 vs EBS vs EFS. Cost tiering, durability promises, and the corners where each one bites you.

  3. 03

    Networking

    VPCs, subnets, NAT, peering, transit. The mental model that survives any cloud — and the AWS-specific bits that take three months to internalise.

  4. 04

    Identity & IAM

    Roles, policies, federation, least privilege. The single security topic where small mistakes have the largest blast radius.

  5. 05

    Managed databases

    RDS, Aurora, DynamoDB, and the picking-a-database matrix. When managed Postgres is the right answer and when it isn't.

  6. 06

    Multi-region

    Route 53, Aurora Global, replication shapes, failover drills. The cost and complexity tax — and when it actually earns its keep.

  7. 07

    Observability

    Logs, metrics, traces. CloudWatch versus Datadog versus rolling your own — and what "observable" means when the bill is part of the design.

  8. 08

    Cost engineering

    Reserved instances, spot, savings plans, AZ-aware traffic. The line items that quietly run the bill — and the FinOps habits that keep them in check.

The page shape

Every topic follows the same six beats.

So you can scan quickly when you already know the territory and slow down only where it's new.

The premise
What this primitive does in plain English. What problem it solves that wasn't already solved by something simpler.
The AWS-canonical version
The default services, configured the way most teams actually run them. Not exam-prep depth — the mental model.
GCP and Azure equivalents
A short table that maps the AWS names to the others. Useful when you switch clouds or argue with someone who has.
How to pick
A small decision tree for "which one do I reach for here." The interesting part is the trade-offs — cost, lock-in, ops burden.
What breaks
The corners where the primitive bites you. The bugs you'll learn the hard way unless someone tells you first.
Cost note
The line item that adds up — egress, cross-AZ traffic, idle instances, the surprise on the bill that took a meeting to explain.
The other two clouds

GCP and Azure, same shape.

Numbers worth knowing

The figures that end up shaping the bill.

WhatValue
S3 standard durability11 nines (99.999999999%)
S3 standard availability SLA99.9%
EC2 / single-instance SLA99.5%
Multi-AZ deployment SLA99.99%
Cross-AZ data transfer~$0.01/GB each way
Internet egress (first tiers)~$0.09/GB
NAT Gateway processing~$0.045/GB + hourly
Spot vs on-demand discountup to ~70–90% off
Reserved / savings plan discountup to ~72% (3yr)
Lambda billing granularityper 1 ms

These are AWS-flavoured order-of-magnitude figures and they drift over time — check the pricing page before you put a number in a doc. Treat them as a sanity check on whether a design is in the right neighbourhood, not as quotes.

Common traps

Five that cost real money or real sleep.

  1. Letting cross-AZ traffic become the architecture.
    Spreading services across availability zones is the right call for resilience, but every byte that crosses an AZ boundary is metered both ways. A chatty service mesh, a database read replica in another AZ, a load balancer fanning out blindly — each one is a line item that grows with traffic, not with footprint. Pin the chatty paths within an AZ and accept the cross-AZ cost only where it buys real failure isolation.
  2. Treating IAM as something to tighten later.
    Identity is the one topic where a small mistake has the largest blast radius. A wildcard policy attached "just to unblock the demo" tends to outlive the demo by years. Start from least privilege and widen on evidence — it is far cheaper than auditing your way back from a role that can do everything. Read the identity page before the others if security is on you.
  3. Reaching for multi-region before single-region is solid.
    Multi-region is the most expensive resilience you can buy, in dollars and in operational complexity. Replication lag, split-brain risk, failover drills nobody runs, and a bill that roughly doubles. Most outages are fixed by multi-AZ and a good runbook. Earn multi-region with a real availability requirement, not a hypothetical one.
  4. Assuming managed means hands-off.
    RDS still needs you to pick instance sizes, plan for failover, and watch storage autoscaling. A managed database hides the operating system, not the capacity math. The provider runs the daemon; you still own the schema, the connection pool, and the decision about when Postgres stops being the right answer.
  5. Designing for the cloud you read about, not the one you bill on.
    AWS, GCP, and Azure share a mental model but differ in the details that cost money — default MTUs, free-tier boundaries, which traffic is charged, how their IAM scoping nests. The textbook mental model gets you most of the way; the provider's pricing page and quota docs are the rest. Verify the cost assumptions on the cloud you are actually running.
A note on framing

Why AWS-first.

AWS still owns the largest share by a comfortable margin, the docs and community are the deepest, and most cloud roles assume reading AWS first. The patterns transfer cleanly — once you understand VPC, you understand Google's VPC. Once you understand IAM, Azure's RBAC reads the same way with different nouns. The clouds compete on price and on managed services, not on the underlying mental model.

What this Codex is not: a certification prep guide. The AWS SA, GCP PCA, and Azure AZ-104 each demand a different kind of memorisation. If that's what you need, follow the official study paths — those exams are well-served by their providers. This page is for the senior engineer who needs to design and run things, not list every cloud service from memory.

Where this connects

The paths that sit next to this one.

The cloud abstractions hide a lot of computer science. Each path below pulls a layer back.

Above the primitives
Designing and running systems

What you build once the primitives are in hand — the growth story and the orchestration layer.

  1. Scale to millions on AWS
  2. Kubernetes — the workload runtime
  3. Napkin math — cloud sizing
Beneath the abstractions
The layers the cloud hides

VPC makes more sense once you've seen TCP/IP; managed databases make sense once you've seen the algorithm.

  1. Networking — the protocols under VPC
  2. Distributed systems — the hidden algorithms
  3. System design — the full path
Ship on a cloud
Service-shaped, hands-on

The AWS deep dives where the day-to-day reflexes live, page by page.

  1. AWS Codex — all 16 pages
  2. CloudFront — the edge
  3. KMS & secrets — envelope encryption
Start here

Compute — VMs, containers, serverless →

The first decision in any cloud build: how is your code going to run? EC2, ECS, EKS, Lambda — the three buckets, the trade-offs, the GCP and Azure equivalents.

Read Compute