Compute — three buckets, pick one.
Every cloud compute service drops into one of three buckets: a virtual machine you treat like a server, a container scheduler that runs your image, or a function-as-a-service that runs a snippet when something happens. Picking the right bucket is the first decision in any cloud build. Once you've picked, the second decision is which managed flavour — and that's where the AWS / GCP / Azure trios all start to look the same.
1 · The three buckets
- Virtual machine. You get a Linux box. SSH in, install whatever, run it. Long-running, mutable, full control. Default for legacy lifts, GPU workloads, anything that needs a specific kernel module or persistent local disk.
- Container scheduler. You hand the platform an image plus a count, and it runs N copies somewhere. You stop thinking about hosts. Default for stateless web tiers, modern microservices, anything that benefits from immutable deploys and per-deploy isolation.
- Function-as-a-service. You upload code, the platform wires it to an event (HTTP, queue, schedule, file landing in storage). It runs only when triggered. Default for event-glue, sporadic workloads, anything you'd otherwise build as a tiny always-on service.
2 · The AWS canonical version
| Bucket | AWS service | What it actually is |
|---|---|---|
| VM | EC2 | A virtual machine. Family + size (e.g. m7i.large) picks CPU/RAM/network. Bring your own AMI or use one Amazon ships. |
| Container (orchestrated for you) | ECS on Fargate | Tasks run on AWS-managed infrastructure. You never see the host. Good default if you don't need Kubernetes. |
| Container (you run K8s) | EKS | Kubernetes control plane managed by AWS; you bring the worker nodes (or use Fargate-backed pods). For teams already living in K8s. |
| Container (you run K8s, harder) | EC2 + self-managed K8s | kubeadm. Almost nobody does this on purpose any more. |
| Serverless functions | Lambda | Zip or container image, triggered by API Gateway, EventBridge, SQS, S3 events, etc. Up to 15 min per invocation. |
| Long-running container, no orchestration | App Runner | Push a container, get an HTTPS endpoint. The "Heroku on AWS" play. |
| Batch jobs | AWS Batch | Submit jobs, AWS provisions EC2 (often Spot), runs to completion, tears down. Built for HPC-shaped workloads. |
At the senior level you'll be using EC2, ECS-on-Fargate, EKS, and Lambda for ~95% of decisions. The rest are real, but you reach for them less often.
3 · GCP and Azure equivalents
| Bucket | AWS | GCP | Azure |
|---|---|---|---|
| VM | EC2 | Compute Engine (GCE) | Azure Virtual Machines |
| Managed K8s | EKS | GKE | AKS |
| Container, no K8s | ECS / Fargate | Cloud Run | Container Apps |
| Serverless functions | Lambda | Cloud Functions (2nd gen, runs on Cloud Run under the hood) | Azure Functions |
| "Push a container, get a URL" | App Runner | Cloud Run | Container Apps / App Service |
| Batch | AWS Batch | Cloud Batch / Dataproc | Azure Batch |
4 · How to pick — a decision diagram
Five questions, four outcomes. The unstated rule: in 2026, "I'd reach for Fargate / Cloud Run-shaped services" is the right default for new builds unless one of the upper branches answers yes.
- Event-driven and bursty? Lambda / Cloud Functions / Azure Functions. The pay-per-invocation model wins.
- Long-running web service, no K8s? ECS-on-Fargate, Cloud Run, Container Apps. The sweet spot.
- Already on Kubernetes? EKS / GKE / AKS. The managed control plane is worth the money; running your own K8s in 2026 is a hobby.
- Specific kernel, persistent local disk, GPU? EC2 / GCE / Azure VMs. The escape hatch.
- Lifting a legacy stack as-is? VMs. Refactor after, not during.
5 · Six real workloads, six picks
| Workload | Right pick | Why |
|---|---|---|
| Public web app, ~5K QPS, single region | ECS-on-Fargate behind ALB | Stateless, auto-scaling without K8s overhead. ~$2K/month at this scale. |
| S3-event glue (resize image on upload) | Lambda | Sporadic, ≤15 min, IAM-driven trigger. Pennies per million events. |
| Microservices fleet, 30+ services | EKS / GKE | K8s pays for itself past ~15 services. CRDs, namespaces, service mesh become useful. |
| Nightly batch ETL, 6 hours, 200-node burst | AWS Batch on Spot | Embarrassingly parallel, fault-tolerant per task. 80% cheaper than on-demand. |
| GPU inference (model serving) | EC2 with G/P-family + Inferentia | Need specific accelerators not all containers expose. SageMaker if you'd rather not manage. |
| Lifted legacy monolith (.NET on Windows) | EC2 Windows + RDS for SQL Server | Don't refactor while migrating. App Modernization is a separate workstream. |
6 · What breaks
- Cold starts on functions. First invocation after idle is slow — anywhere from 50 ms (Node, simple) to several seconds (Java with a fat init). Mitigations: provisioned concurrency, Lambda SnapStart, or just keep the function warm with a scheduled ping.
- 15-minute Lambda cap. Long jobs that look serverless-friendly until they don't fit. Move to Step Functions, Fargate, or a Batch job.
- Container cold starts. Cloud Run and Fargate scale to zero by default; the first request after idle warms a container. Usually fine, but a P99 killer if you serve a tiny amount of traffic.
- EKS upgrade pain. Kubernetes versions move quarterly, control-plane upgrades are mostly painless, node upgrades are not. Plan for a day per cluster, twice a year.
- VM drift. Long-lived EC2s accumulate manual changes nobody documented. The cure is immutable AMIs and replace-don't-patch — easier said than done.
- Spot interruptions. Spot/Preemptible/Spot VM instances vanish with a few minutes' notice. Fine for batch; not fine for stateful workloads unless you've designed for it.
7 · Cost note
Three line items quietly run the bill:
- Idle VMs. Anything left on overnight or over the weekend is pure waste. Auto-stop on schedule, or move to a service that scales to zero.
- Reserved capacity vs on-demand. Steady-state workloads at on-demand prices are 30–60% more expensive than they need to be. Reserved Instances, Savings Plans, GCP Committed Use, Azure Reservations — they all give roughly the same discount in exchange for a 1- or 3-year commitment.
- Spot for stateless / batch. Same hardware, ~70–90% cheaper, can vanish. Stateless web tiers and batch jobs are perfect candidates. Combine with on-demand baseline so an interruption doesn't take you down.
A reasonable mix at scale: ~60% on Savings Plans / RIs (your baseline), ~30% on Spot (for the elastic and batch portion), ~10% on-demand (to absorb surprises). Cuts the compute bill by roughly half versus all-on-demand.
Further reading
- AWS Well-Architected — Compute pillar. The official framework for compute choice, with the cost dimension front and centre.
- "Serverless in the Wild" — AWS / Microsoft Research. Long-form data on how Lambda is actually used at scale; the cold-start mitigation work in particular.
- Adjacent: Kubernetes Codex. The orchestration layer above container compute.
- Adjacent: Cost engineering. RIs, Savings Plans, Spot, and the FinOps practices behind them.
- Adjacent: Scale to millions on AWS. The compute choices in context — when each stage's growth makes the next compute primitive worth its complexity.