EC2, EBS & AMIs.
The oldest service AWS sells, and still the substrate underneath nearly everything else — Lambda runs on Firecracker on EC2, ECS-on-Fargate runs on tiny EC2 hosts, RDS is software on top of EC2. Worth knowing the instance families, the Nitro architecture that powers them, the EBS volume types you'll attach for storage, and the AMI machinery that lets you bake a known-good system image.
1 · What EC2 actually is
The mental model that survives every conversation about cloud compute: EC2 is a market for time-sliced virtual machines on top of AWS's own hypervisor and a custom hardware substrate called Nitro. Every modern EC2 instance is a guest VM on a physical host where almost all the "virtualisation tax" (networking, EBS I/O, security) has been moved off the host CPU and onto dedicated silicon, so the guest gets very nearly bare-metal performance. The instance type names a slice of CPU, memory, network, and storage; everything else is plumbing.
A short history clarifies why Nitro matters. EC2 launched in 2006 on Xen paravirtualisation — the hypervisor ran on the host CPU, intercepted privileged guest operations, and emulated the network and storage interfaces in software. That worked, but the hypervisor was competing with the guest for CPU cycles, network traffic walked through dom0 (Xen's privileged management domain), and security patches required rebooting hosts. By 2017 AWS had been building Nitro in the background — a stripped-down KVM-based hypervisor plus custom Nitro Cards (PCIe accelerators) that handle networking, EBS storage, and a hardware root-of-trust. The hypervisor itself fits in a few megabytes and does almost nothing per request; the cards do the work. Bare-metal instance types (m7i.metal etc.) ship without the hypervisor at all — the customer's OS boots straight onto the Nitro hardware.
Three architectural consequences fall out of this design. First, EBS isn't really a disk — it's an NVMe-over-fabric connection from the guest, through a Nitro card, to a regional EBS service. The guest sees /dev/nvme1n1; the actual blocks sit on a different chassis somewhere in the AZ. Second, the AWS-side admin path is provably absent — the Nitro Security Chip's design (documented in "The Security Design of the AWS Nitro System") attests to the firmware on every card and on the hypervisor; there's no SSH login on Nitro hosts; AWS engineers can't access running customer instances even if they wanted to. Third, Lambda and Fargate run on the same substrate, just with Firecracker microVMs instead of full Linux guests. The Nitro story is the whole story of AWS compute.
| EC2 is good for | EC2 is bad for |
|---|---|
| Long-running services with steady utilisation (web app, DB, batch worker) | Tiny, bursty workloads where idle time dominates (use Lambda) |
| Workloads you want to lift-and-shift from on-prem unchanged | Anything you'd rather not own an OS for (RDS / DynamoDB / managed services) |
| Specialised hardware needs — GPUs, Inferentia/Trainium, FPGA, high-frequency CPUs | Pure event-driven code under 15 minutes (Lambda is cheaper + simpler) |
| Per-tenant isolation by VM (regulated workloads, BYO-OS) | "I want a database" — let RDS / Aurora / DynamoDB own the EC2 underneath |
| Spot-friendly batch with checkpointing (training jobs, video encoding) | Stateful workloads that can't tolerate a 2-minute eviction (use on-demand) |
2 · Instance families — decoding the name
An EC2 instance type is a string like m7i.large. Read it left-to-right:
| Piece | Meaning | Examples |
|---|---|---|
| Family letter | The optimised dimension | m general · c compute · r memory · i NVMe local storage · d spinning disk · x/u extreme memory · p/g GPU · inf/trn Inferentia/Trainium · hpc HPC |
| Generation number | Hardware generation. Higher = newer = usually cheaper per unit of work | m5 → m6 → m7 |
| Processor suffix | CPU vendor / family | i Intel · a AMD · g Graviton (AWS Arm) |
| Capability modifiers | Optional extras | n high network · d NVMe instance storage · z high frequency · flex burstable price/perf |
| Size | vCPU / RAM scale | nano → micro → ... → large → 2xlarge → ... → 32xlarge → metal |
r7gd.4xlarge = "memory-optimised, gen 7, Graviton, with NVMe local storage, 4xlarge size (16 vCPU / 128 GiB RAM)." Once you internalise the letters, the catalogue stops being intimidating.Graviton (g suffix) is AWS-designed Arm. AWS's own Graviton benchmarks claim roughly 20–40% better price/performance than equivalent Intel for most workloads — Java, Node, Python, Go, Rust all run native. Native dependencies (some scientific libs, some legacy binaries) may not have Arm builds yet. Always benchmark before betting on a specific workload, but the default move on new builds is "try Graviton first."
3 · Nitro — the practical implications
Knowing the Nitro shape (from section 1) pays off in three operational consequences:
- Enhanced networking is free. ENA / EFA on Nitro instances bypass the kernel's network stack via SR-IOV — the guest driver talks directly to the Nitro VPC card. Throughput scales with the instance size (e.g.,
c7gnreaches 200 Gbps); latency is single-digit microseconds within an AZ for the same Cluster Placement Group. - EBS bandwidth is per-instance, not per-volume. Each instance type has a maximum EBS throughput (the Nitro EBS card's link). A
m7i.largecaps around 0.7 GB/s;m7i.16xlargereaches 20 GB/s. Striping multiple gp3 volumes only helps up to that ceiling. Checkaws ec2 describe-instance-types --instance-types <type>for the exact numbers. - Nitro Enclaves are a different kind of VM. An enclave is a guest-on-the-guest with no networking (only a vsock channel to its parent), no SSH, no operator login. Designed for handling secrets — keys, PII, payment data — with hardware attestation that the code running inside is what you uploaded. Used in production by Coinbase, Evervault, and AWS KMS itself.
Practical hygiene: EC2 metadata (http://169.254.169.254/) on Nitro supports IMDSv2 with session tokens. The v1 API was the vector behind the 2019 Capital One breach (a WAF SSRF read instance role creds from IMDSv1 with a single GET request). Always require IMDSv2: aws ec2 modify-instance-metadata-options --http-tokens required on every instance you create. New AMIs default to IMDSv2-required since 2024, but check.
4 · Instance launch lifecycle
A successful RunInstances looks like one API call but kicks off a small sequence under the hood. Tracing it once removes a lot of mystery from "why did my instance take 90 seconds to come up":
Five steps end-to-end. The control plane validates the request (IAM, quota, AMI permissions) and asks the placement service for capacity — match the AZ, the instance type, the placement-group rules, and a Nitro host with a free slot. The Nitro host attaches the root EBS volume (as NVMe-over-fabric), attaches the ENI on the right VPC subnet, arms IMDS with the instance's IAM role credentials, and starts the guest. The guest's kernel boots, cloud-init runs user-data, the SSM agent registers with Systems Manager, and CloudWatch's StatusCheckFailed metric turns green after both reachability and system checks pass. Only at the final state — 2/2 checks, InService — should an ASG or load balancer route traffic at it.
Typical wall-clock: 20–40 seconds from RunInstances to state=running; 60–120 seconds to "2/2 checks pass" because the checks themselves run on a 30-second cadence; longer if your user-data does meaningful work (package installs, container pulls). Pre-baked AMIs (Packer) cut this to about 60 seconds total. Auto Scaling Groups' "warm pools" and "instance refresh" features keep already-booted instances on standby for sub-10-second scale-out.
5 · EBS internals — block storage as SAN
EBS is best understood as a regional storage-area network presented to instances as an NVMe device. The block layer lives on a fleet of storage hosts in the AZ; the guest talks to it via the Nitro EBS card; reads and writes are tagged with a volume ID and offset, replicated synchronously to enough storage nodes inside the AZ to hit the durability target, and acknowledged back to the guest. The whole arrangement gives ~1ms median latency for warm reads and 99.999% volume durability per AZ — at the cost of "the disk is across the network", which is the second most important thing to internalise after "instance store is ephemeral."
Snapshots are incremental and S3-backed. The first snapshot of a volume copies every written block (sparse: a 1 TB volume with 50 GB of data writes 50 GB of snapshot). Subsequent snapshots copy only blocks changed since the last snapshot. Each snapshot is a manifest of block pointers into a per-volume object pool in S3 — snapshots can be deleted out of order, and EBS's garbage collector reclaims any blocks no remaining snapshot references. The user-facing consequence: snapshot-to-snapshot deltas are cheap, but the first snapshot of a large volume can take hours, and snapshots remain billable until every snapshot referencing a block is deleted.
Restores hydrate lazily. Creating a volume from a snapshot returns a new volume immediately, but the blocks aren't pre-fetched from S3 — they're pulled on first read. A freshly-restored volume has surprisingly slow initial reads ("first-touch penalty") that can run hundreds of milliseconds per block. Either pre-warm by reading the whole volume (fio --rw=read --filename=/dev/nvme1n1 --bs=1M --iodepth=32) or use Fast Snapshot Restore (FSR), which costs $0.75/hour per snapshot-per-AZ but pre-loads the blocks so restores hit full performance immediately. FSR makes sense for DR snapshots that might need to restore quickly; not for routine backups.
Encryption at rest is on by default since 2023. The Nitro EBS card encrypts blocks with an AES-256 data key, which itself is wrapped by a customer-controlled KMS key (default AWS-managed aws/ebs, or your own CMK). The encryption happens on the storage host; volume snapshots inherit the same KMS key; cross-account snapshot sharing requires re-encrypting under a key the destination account can decrypt. There's no measurable performance penalty — the encryption is offloaded to the card.
| Operation | What actually happens | Latency / cost |
|---|---|---|
| 4 KB random read (gp3, warm) | NVMe to Nitro card → storage host → memory | ~0.5–1 ms · counts as 1 IOP |
| 4 KB random write (gp3) | Write + sync replication to N storage hosts in AZ | ~1–2 ms · counts as 1 IOP |
| 256 KB sequential read | NVMe with read-ahead, coalesced into one fetch | ~1 ms · counts as 1 IOP (up to 256 KB) |
| 1 MB write | Split into 4× 256 KB writes | ~3–4 ms · 4 IOPS |
| CreateSnapshot | Mark changed blocks, async copy to S3 | API returns in seconds; copy finishes in minutes–hours |
| CreateVolume from snapshot | Allocate, register, expose NVMe; lazy block fetch | API returns in seconds; first-touch reads slow until warmed |
| Encryption (default since 2023) | AES-256 on Nitro EBS card; KMS key wraps data key | No measurable perf cost |
d in m7gd, the standalone i family) is local NVMe physically attached to the Nitro host — orders-of-magnitude faster than EBS for raw IOPS, but the data vanishes on stop, hibernate, or hardware failure. Use it for caches, scratch space, locally-rebuildable indexes. Never put state you need to survive a host failure on instance store.6 · EBS — the storage menu
EBS volumes are network-attached block storage. The disk is not under the rack; it's reached over the Nitro card's storage path to a regional EBS service. That has two consequences worth internalising: (a) volumes survive instance termination, (b) cross-AZ attachment is impossible — an EBS volume lives in exactly one AZ.
| Type | Use case | IOPS / throughput | Price model |
|---|---|---|---|
gp3 | Default for most workloads — boot disks, app data | 3,000 baseline IOPS, 125 MB/s throughput; pay extra for more | Per GB · per provisioned IOPS · per MB/s |
gp2 | Legacy default — being replaced by gp3 everywhere | IOPS scales with size (3 IOPS/GB), burstable | Per GB (IOPS bundled) |
io2 Block Express | Mission-critical DB — Aurora-like workloads on EC2 | Up to 256k IOPS / 4 GB/s; 99.999% durability | Most expensive — per GB · per IOPS |
st1 | Throughput-optimised HDD — big sequential reads (logs, MapReduce) | 500 MB/s max, low IOPS | Cheap per GB; no IOPS cost |
sc1 | Cold HDD — rarely-accessed archive | 250 MB/s max, very low IOPS | Cheapest EBS option |
| Instance store | (Not EBS) — NVMe under the rack on certain instance types | Highest IOPS, lowest latency | Bundled with instance; ephemeral — data lost on stop |
io2 only when you actually need > 16,000 IOPS sustained; reach for st1 only when the workload is sequential and large.Snapshots are an incremental, block-level, S3-backed copy of a volume. The first snapshot copies every used block; subsequent snapshots only copy changed blocks. Restoring a snapshot creates a new volume that's lazily hydrated — blocks are fetched from S3 the first time they're read, which means freshly-restored volumes have surprisingly slow reads until they warm up. fio --rw=read across the whole volume forces the hydration if you need predictable post-restore performance.
7 · AMIs and the bake-vs-bootstrap question
An AMI (Amazon Machine Image) is a packaged template that EC2 launches from. It includes a root snapshot, kernel selection, and metadata about architecture and virtualisation type. The question every team has to answer is: how much of the OS + app is baked into the AMI vs configured at boot?
| Pattern | How it's done | Trade-off |
|---|---|---|
| Golden AMI | Packer build pipeline → AMI per release | Fast startup (seconds), no boot-time surprises. Slow to update — must rebuild for every change. |
| Base AMI + user-data | Stock Amazon Linux + cloud-init script | Flexible. Slow startup (minutes) and any boot-time failure is a runtime problem. |
| Base AMI + config mgmt | Ansible / Chef / Salt convergence after boot | Same trade-off as user-data, plus the config-mgmt overhead. |
| Container image | Push container; node just runs Docker | The current default for new workloads — fast roll-forwards, immutable artefacts. |
AMIs are regional artefacts. Copy them between regions with aws ec2 copy-image when doing multi-region deploys. AMIs are also AZ-agnostic within a region — the AZ is decided at instance launch.
8 · Placement groups, Spot, dedicated hosts
- Placement groups — cluster. Instances physically near each other (same rack-ish) for low-latency networking. Used in HPC and trading.
- Placement groups — spread. Instances forced onto different hardware. Used to spread your handful of critical instances across racks for fault isolation.
- Placement groups — partition. Larger spread — multiple "partitions," each on different hardware. Cassandra, HDFS, Kafka.
- Spot instances — bid for unused capacity at a 70–90% discount. AWS can reclaim with 2 minutes' notice. Perfect for batch jobs, CI, fault-tolerant workers. Wrap your application in a SIGTERM handler that finishes the current job and exits cleanly within 2 minutes.
- Reserved Instances / Savings Plans. Commit 1 or 3 years to a baseline of compute, get 30–60% off. Savings Plans are the modern shape — Compute Savings Plans apply across instance family/region/OS, which is more flexible than legacy RIs.
- Dedicated hosts — pay for a whole physical host so you control tenancy. Niche: bring-your-own-licence for Oracle/Windows, or compliance regimes that forbid shared hardware.
9 · Build it yourself — launch, attach EBS, observe metadata, terminate
- Pick a region and the latest Amazon Linux AMI.
export AWS_REGION=us-east-1 AMI_ID=$(aws ssm get-parameter --name /aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64 --query 'Parameter.Value' --output text) echo "AMI: $AMI_ID" - Launch an instance, IMDSv2 enforced.
INSTANCE_ID=$(aws ec2 run-instances \ --image-id $AMI_ID \ --instance-type t3.micro \ --count 1 \ --metadata-options "HttpTokens=required,HttpEndpoint=enabled" \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=lab}]' \ --query 'Instances[0].InstanceId' --output text) echo "Launched: $INSTANCE_ID" - Wait for it to be running, get the public IP.
aws ec2 wait instance-running --instance-ids $INSTANCE_ID PUBLIC_IP=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID \ --query 'Reservations[0].Instances[0].PublicIpAddress' --output text) echo "IP: $PUBLIC_IP" - Provision a gp3 volume and attach it.
AZ=$(aws ec2 describe-instances --instance-ids $INSTANCE_ID \ --query 'Reservations[0].Instances[0].Placement.AvailabilityZone' --output text) VOL_ID=$(aws ec2 create-volume \ --availability-zone $AZ \ --size 8 --volume-type gp3 --iops 3000 --throughput 125 \ --query VolumeId --output text) aws ec2 wait volume-available --volume-ids $VOL_ID aws ec2 attach-volume --instance-id $INSTANCE_ID --volume-id $VOL_ID --device /dev/sdf - Inspect instance metadata (IMDSv2 with token). Run this on the instance via SSM Session Manager (no SSH key needed) or SSH:
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \ -H "X-aws-ec2-metadata-token-ttl-seconds: 60") curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \ http://169.254.169.254/latest/meta-data/instance-id curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \ http://169.254.169.254/latest/meta-data/iam/security-credentials/ - Tear it all down.
aws ec2 detach-volume --volume-id $VOL_ID aws ec2 wait volume-available --volume-ids $VOL_ID aws ec2 delete-volume --volume-id $VOL_ID aws ec2 terminate-instances --instance-ids $INSTANCE_ID aws ec2 wait instance-terminated --instance-ids $INSTANCE_ID
aws ec2 describe-volumes --filters Name=status,Values=available monthly and clean up orphans.10 · Real-world case studies
Three public stories give a sense of how EC2's economics, Spot, and Graviton actually play out at scale.
Netflix — Spot at scale via Spinnaker. Netflix has been one of the largest public consumers of EC2 Spot for over a decade. "Creating your own EC2 Spot market" describes how their deployment platform (Spinnaker) chooses across multiple Spot pools — different instance types, different AZs, different generations — using a price-and-interruption-rate heuristic, with automated remediation when a pool runs hot. The pattern that survives: don't pick a single Spot type and pray; build a "fleet" definition that lists many compatible types and lets EC2 fill from whichever pool has capacity. Modern mixed-instance Auto Scaling Groups and EC2 Fleet APIs codify this pattern; Karpenter on EKS is the Kubernetes-native reincarnation.
Lyft — mixed-instance autoscaling for cost and resilience. Lyft's engineering blog has documented their move to mixed-instance ASGs in "Optimizing our AWS bill" and related posts. The headline: by allowing each service's ASG to choose across m5, m5a (AMD), m5n, and m6i at multiple sizes, and combining that with a base layer of on-demand plus a Spot extension, Lyft cut compute spend significantly while reducing capacity-shortage incidents — because a "give me 100 vCPU of any general-purpose instance" request is much harder to starve than "give me 25 of m5.large." The same pattern shows up at Robinhood, Pinterest, and others; the AWS Well-Architected guidance now codifies it as "instance diversification."
Snap — Graviton migration in production. Snap's case study with AWS and follow-up re:Invent talk walk through migrating large portions of Snapchat's stateless services from m5/c5 to m6g/c6g Graviton instances. The published outcomes — roughly 20% lower compute cost at equal or better latency — have become the standard "should I try Graviton" data point. The actual migration work is mostly Docker base-image swaps to multi-arch images (linux/amd64 + linux/arm64 manifests), rebuilding native dependencies for Arm, and benchmarking JIT-heavy workloads (Java, V8) where the cost story sometimes inverts on a specific microbenchmark before evening out in production. Twitter (per their case study) reported similar gains; Honeycomb's writeup documents the migration mechanics in detail.
The through-line: at scale, the cost of compute is shaped by mixing instance families, generations, AZs, purchase models, and CPU architectures more than by picking the "right" single type. Default to fleet, Graviton, Spot for fault-tolerant workloads, and Savings Plans for the steady base.
11 · What breaks
- Instance stops with "InsufficientInstanceCapacity." AWS occasionally runs out of capacity for a specific instance type in a specific AZ. Solutions: use mixed-instance ASGs, try another AZ, fall back to a similar generation, use Capacity Reservations or On-Demand Capacity Reservations for critical workloads, or commit to a Reserved Instance / Savings Plan with a capacity guarantee.
- ENI / IP limits per instance type. Each instance type has a hard cap on the number of network interfaces and IPs per interface —
t3.microallows 2 ENIs / 2 IPs each;m7i.largeallows 3 ENIs / 10 IPs each;m7i.32xlargeallows 15 / 50. EKS uses ENI-per-pod by default, so pickingt3.mediumfor an EKS node caps you at ~17 pods per node before IP exhaustion. Checkaws ec2 describe-instance-types --instance-types <type>→NetworkInfobefore sizing nodes. - Instance-store data vanishes on stop. The
d-family local NVMe disks are physically on the host.aws ec2 stop-instancesmoves the guest to a new host on next start — local disk gone. Hibernate also blanks it. Anything you need to survive must be on EBS, S3, or elsewhere. - gp3 burst credits vs sustained IOPS. gp3 doesn't use credits — IOPS are provisioned per-volume (baseline 3,000, up to 16,000 with extra cost). gp2 does use a credit bucket and bursts to 3,000 IOPS for short windows on small volumes; sustained high IOPS depletes credits and falls back to the per-GB baseline (3 IOPS/GB). The migration story: gp2 → gp3 makes "why did my IOPS suddenly drop after running for an hour" go away.
- EBS volume runs out of IOPS. Watch CloudWatch
VolumeQueueLength; consistently above 1 means saturated. Raise IOPS on gp3, or move to io2 if you need 16,000+ sustained. Remember the per-instance EBS bandwidth ceiling — a single big volume can't exceed it. - AMI region-copy delays.
aws ec2 copy-imagefromus-east-1toeu-west-2can take 15–60 minutes for large AMIs (it physically copies the snapshot data across the AWS backbone). Multi-region deploys must wait for the copy to finish, or use a build pipeline that runs Packer in each region in parallel. EBS Fast Snapshot Restore doesn't apply to cross-region copies. - Spot interruption. Your worker dies mid-job with a 2-min warning (delivered via the IMDS endpoint
http://169.254.169.254/latest/meta-data/spot/instance-action). Handle SIGTERM, drain in-flight work, push state to S3/SQS, use a fleet of Spot + on-demand so the on-demand floor absorbs the loss. - "It worked on my AMI but not in the cloud." Almost always either (a) IAM role missing, (b) security group blocking, (c) user-data script silently failing — check
/var/log/cloud-init-output.logandaws ec2 get-console-output --instance-id $ID. - Burst credits exhausted on t-class. t3/t4g instances burst CPU above baseline (e.g., t3.medium baseline is 20% of 2 vCPUs), but only as long as credits last. Run a sustained-CPU workload on a t3.micro and it'll throttle hard.
aws cloudwatch get-metric-statisticsonCPUCreditBalance— at zero, you've fallen back to baseline. Usem/cfamily for steady load, or enable T3 Unlimited (which lifts the cap and charges for sustained burst). - Graviton-native dependency missing. Native libraries (some scientific Python, certain Node modules with C bindings, old proprietary binaries) may not have Arm builds. Check
uname -min your container; if you seeaarch64and a dependency installation fails, the package needs an Arm wheel / multi-arch image.
12 · Further reading
- EC2 instance type catalogue. The canonical list, family by family.
- EBS volume types. The precise IOPS / throughput / durability numbers.
- Security Design of the AWS Nitro System. The whitepaper that explains what's running underneath.
- The Nitro System. AWS's overview page; links to the firmware-attestation talks.
- Netflix — Creating your own EC2 Spot market. Spot-at-scale via Spinnaker.
- Lyft — Optimizing the AWS bill. Mixed-instance ASGs in production.
- Snap on Graviton. Production migration to Arm at Snapchat scale.
- Honeycomb on Graviton2. Engineering writeup of the migration mechanics.
- SSDs & NVMe. Why EBS gp3 vs instance NVMe latency are an order of magnitude apart.
- Cloud compute (concepts). The decision tree for "VM vs container vs serverless."