01 / 08
Cloud Codex / 01

Compute — three buckets, pick one.

Every cloud compute service drops into one of three buckets: a virtual machine you treat like a server, a container scheduler that runs your image, or a function-as-a-service that runs a snippet when something happens. Picking the right bucket is the first decision in any cloud build. Once you've picked, the second decision is which managed flavour — and that's where the AWS / GCP / Azure trios all start to look the same.


1 · The three buckets

  • Virtual machine. You get a Linux box. SSH in, install whatever, run it. Long-running, mutable, full control. Default for legacy lifts, GPU workloads, anything that needs a specific kernel module or persistent local disk.
  • Container scheduler. You hand the platform an image plus a count, and it runs N copies somewhere. You stop thinking about hosts. Default for stateless web tiers, modern microservices, anything that benefits from immutable deploys and per-deploy isolation.
  • Function-as-a-service. You upload code, the platform wires it to an event (HTTP, queue, schedule, file landing in storage). It runs only when triggered. Default for event-glue, sporadic workloads, anything you'd otherwise build as a tiny always-on service.

2 · The AWS canonical version

BucketAWS serviceWhat it actually is
VMEC2A virtual machine. Family + size (e.g. m7i.large) picks CPU/RAM/network. Bring your own AMI or use one Amazon ships.
Container (orchestrated for you)ECS on FargateTasks run on AWS-managed infrastructure. You never see the host. Good default if you don't need Kubernetes.
Container (you run K8s)EKSKubernetes control plane managed by AWS; you bring the worker nodes (or use Fargate-backed pods). For teams already living in K8s.
Container (you run K8s, harder)EC2 + self-managed K8skubeadm. Almost nobody does this on purpose any more.
Serverless functionsLambdaZip or container image, triggered by API Gateway, EventBridge, SQS, S3 events, etc. Up to 15 min per invocation.
Long-running container, no orchestrationApp RunnerPush a container, get an HTTPS endpoint. The "Heroku on AWS" play.
Batch jobsAWS BatchSubmit jobs, AWS provisions EC2 (often Spot), runs to completion, tears down. Built for HPC-shaped workloads.

At the senior level you'll be using EC2, ECS-on-Fargate, EKS, and Lambda for ~95% of decisions. The rest are real, but you reach for them less often.

3 · GCP and Azure equivalents

BucketAWSGCPAzure
VMEC2Compute Engine (GCE)Azure Virtual Machines
Managed K8sEKSGKEAKS
Container, no K8sECS / FargateCloud RunContainer Apps
Serverless functionsLambdaCloud Functions (2nd gen, runs on Cloud Run under the hood)Azure Functions
"Push a container, get a URL"App RunnerCloud RunContainer Apps / App Service
BatchAWS BatchCloud Batch / DataprocAzure Batch
Cloud Run is the prettiest of the bunch. Push a container, get autoscaling from zero to thousands, pay per request, scale-to-zero. AWS App Runner is the equivalent and a few years behind. Azure Container Apps is the third option. If you're starting fresh and don't already live in EKS, Cloud Run-shaped services are usually the right default.

4 · How to pick — a decision diagram

Five questions, four outcomes. The unstated rule: in 2026, "I'd reach for Fargate / Cloud Run-shaped services" is the right default for new builds unless one of the upper branches answers yes.

  1. Event-driven and bursty? Lambda / Cloud Functions / Azure Functions. The pay-per-invocation model wins.
  2. Long-running web service, no K8s? ECS-on-Fargate, Cloud Run, Container Apps. The sweet spot.
  3. Already on Kubernetes? EKS / GKE / AKS. The managed control plane is worth the money; running your own K8s in 2026 is a hobby.
  4. Specific kernel, persistent local disk, GPU? EC2 / GCE / Azure VMs. The escape hatch.
  5. Lifting a legacy stack as-is? VMs. Refactor after, not during.

5 · Six real workloads, six picks

WorkloadRight pickWhy
Public web app, ~5K QPS, single regionECS-on-Fargate behind ALBStateless, auto-scaling without K8s overhead. ~$2K/month at this scale.
S3-event glue (resize image on upload)LambdaSporadic, ≤15 min, IAM-driven trigger. Pennies per million events.
Microservices fleet, 30+ servicesEKS / GKEK8s pays for itself past ~15 services. CRDs, namespaces, service mesh become useful.
Nightly batch ETL, 6 hours, 200-node burstAWS Batch on SpotEmbarrassingly parallel, fault-tolerant per task. 80% cheaper than on-demand.
GPU inference (model serving)EC2 with G/P-family + InferentiaNeed specific accelerators not all containers expose. SageMaker if you'd rather not manage.
Lifted legacy monolith (.NET on Windows)EC2 Windows + RDS for SQL ServerDon't refactor while migrating. App Modernization is a separate workstream.
The mental shortcut. Match the workload to the smallest service that runs it. If a Lambda is enough, don't reach for Fargate. If Fargate is enough, don't reach for EKS. Each step up the chain doubles operational complexity without doubling capability.

6 · What breaks

  • Cold starts on functions. First invocation after idle is slow — anywhere from 50 ms (Node, simple) to several seconds (Java with a fat init). Mitigations: provisioned concurrency, Lambda SnapStart, or just keep the function warm with a scheduled ping.
  • 15-minute Lambda cap. Long jobs that look serverless-friendly until they don't fit. Move to Step Functions, Fargate, or a Batch job.
  • Container cold starts. Cloud Run and Fargate scale to zero by default; the first request after idle warms a container. Usually fine, but a P99 killer if you serve a tiny amount of traffic.
  • EKS upgrade pain. Kubernetes versions move quarterly, control-plane upgrades are mostly painless, node upgrades are not. Plan for a day per cluster, twice a year.
  • VM drift. Long-lived EC2s accumulate manual changes nobody documented. The cure is immutable AMIs and replace-don't-patch — easier said than done.
  • Spot interruptions. Spot/Preemptible/Spot VM instances vanish with a few minutes' notice. Fine for batch; not fine for stateful workloads unless you've designed for it.

7 · Cost note

Three line items quietly run the bill:

  • Idle VMs. Anything left on overnight or over the weekend is pure waste. Auto-stop on schedule, or move to a service that scales to zero.
  • Reserved capacity vs on-demand. Steady-state workloads at on-demand prices are 30–60% more expensive than they need to be. Reserved Instances, Savings Plans, GCP Committed Use, Azure Reservations — they all give roughly the same discount in exchange for a 1- or 3-year commitment.
  • Spot for stateless / batch. Same hardware, ~70–90% cheaper, can vanish. Stateless web tiers and batch jobs are perfect candidates. Combine with on-demand baseline so an interruption doesn't take you down.

A reasonable mix at scale: ~60% on Savings Plans / RIs (your baseline), ~30% on Spot (for the elastic and batch portion), ~10% on-demand (to absorb surprises). Cuts the compute bill by roughly half versus all-on-demand.

Further reading

  • AWS Well-Architected — Compute pillar. The official framework for compute choice, with the cost dimension front and centre.
  • "Serverless in the Wild" — AWS / Microsoft Research. Long-form data on how Lambda is actually used at scale; the cold-start mitigation work in particular.
  • Adjacent: Kubernetes Codex. The orchestration layer above container compute.
  • Adjacent: Cost engineering. RIs, Savings Plans, Spot, and the FinOps practices behind them.
  • Adjacent: Scale to millions on AWS. The compute choices in context — when each stage's growth makes the next compute primitive worth its complexity.
Found this useful?