02 / 08
Cloud Codex / 02

Storage — object, block, file.

Three storage shapes, and almost every cloud service is one of them dressed up. Object storage holds blobs at planet scale. Block storage acts like a virtual hard drive attached to a VM. File storage gives you a shared POSIX filesystem that multiple machines can mount. The trick is to use object wherever you can, block where you must, and file as the escape hatch when an old app refuses to learn S3.


1 · The three shapes

  • Object. A key → blob store. Flat namespace inside a bucket. Designed for billions of objects, eleven nines of durability, parallel reads. The default for media, backups, data lakes, anything immutable.
  • Block. A virtual disk attached to one machine at a time. Looks like /dev/xvdb. Use it like a hard drive: format, mount, write. Fast random IO, but single-attach by default.
  • File. A shared filesystem. NFS or SMB protocol. Multiple machines mount the same volume; they all see the same files. Slower than block, simpler than object for legacy apps that already speak filesystem.

2 · The AWS canonical version

ShapeServiceNotes
ObjectS311 nines of durability. Strong read-after-write since 2020. Lifecycle policies move objects through Standard → IA → Glacier as they age.
BlockEBSAttaches to one EC2 at a time (or io2 Multi-Attach to several with caveats). Volume types — gp3 default, io2 for high IOPS, st1/sc1 for throughput-cheap.
FileEFSNFS. Multi-AZ by default. Slower than EBS, simpler when you need shared filesystem semantics.
File (Windows / SMB)FSx for WindowsSMB protocol, Active Directory integration. The "I have a .NET app" path.
File (Lustre)FSx for LustreParallel filesystem for HPC. Insanely fast, insanely expensive.
ArchiveS3 Glacier / Glacier Deep ArchiveCheapest per GB. Retrieval costs and latency are the tax — minutes for Glacier, hours for Deep Archive.

Most teams live in S3 and EBS. EFS shows up when a legacy app needs shared files. Glacier shows up when the compliance lawyer asks "where are the seven-year backups?"

3 · GCP and Azure equivalents

ShapeAWSGCPAzure
ObjectS3Cloud Storage (GCS)Blob Storage
BlockEBSPersistent Disk / HyperdiskManaged Disks
File (NFS)EFSFilestoreAzure Files (NFS)
File (SMB)FSx for WindowsFilestore (SMB in preview)Azure Files (SMB)
ArchiveGlacier / Deep ArchiveColdline / Archive classBlob Cool / Archive tier
Parallel HPCFSx for LustreParallelstoreAzure Managed Lustre
The portable bit is the protocol, not the service. S3, GCS, and Azure Blob all speak the S3 API (or close to it — GCS has a native API plus S3 compatibility). Most SDKs let you point at any of the three by changing an endpoint. If you're worried about lock-in at the storage layer, you can usually port a workload between clouds by changing config; the data-egress bill is the painful bit, not the code.

4 · The S3 lifecycle, end to end

5 · How to pick

  1. Do you need shared access across many machines? File. Otherwise skip to 2.
  2. Is it a blob (image, video, backup, parquet, log file) or random IO on structured data? Blob → Object. Random IO on a database file → Block.
  3. How often will you read it after writing? Frequently → Standard tier. Rarely → IA / Coldline / Cool. Almost never → Glacier / Archive.
  4. Do you need single-digit-millisecond latency? Block (gp3 or io2). Object is good but not that good — typical S3 GET is 30–100 ms.
  5. Will you be doing analytics over it? Object, with a partitioned key layout. S3 / GCS / Blob all integrate with Athena / BigQuery / Synapse for serverless query.

6 · What breaks

  • Hot S3 prefixes. Modern S3 auto-shards by prefix, but burst limits still apply if every object has a sequential prefix (2026-05-22/...). High-entropy prefixes (UUID-first) avoid the throttling.
  • S3 Standard vs IA cost flip. IA is cheaper per GB but has per-GB retrieval and a 30-day minimum. If you read an IA object often, you pay more than Standard would have cost.
  • EBS volume too small to handle the IOPS. gp3 baseline scales with provisioned IOPS, but burst credits expire under sustained load. Symptom: query latency spikes after 15 minutes of activity.
  • EFS throughput modes. Bursting mode has credit-based throughput tied to volume size. Small volumes throttle hard when busy. Move to provisioned throughput if you've seen the IO graph plateau weirdly.
  • Glacier retrieval bill. Cheap to store, expensive to read. A surprise restore can cost more than a year of Standard storage. Always check retrieval pricing before a tier migration.
  • Cross-region egress. Reading S3 from a different region is metered and not cheap. Replicate to the consumer's region if it's hot enough.

7 · Cost note

Three places the storage bill builds up faster than you expect:

  • Egress. Storing 100 TB in S3 costs ~$2,300/month. Egressing 100 TB out of AWS costs ~$8,000. The cloud charges very little to put data in and a surprising amount to take it out. Put CDNs in front of any externally-served bytes.
  • Cross-AZ traffic. EBS attached to an EC2 in a different AZ — accidentally — generates per-GB charges. Cross-AZ data is one of the top three cost surprises in big cloud bills. Tag and dashboard it early.
  • Snapshot sprawl. Automated EBS snapshots multiply across years. Snapshot lifecycle policies are not optional past a year of operations.

Lifecycle policies on S3 (auto-tier old objects to IA, then Glacier) are the single biggest no-brainer cost win. Most teams set them up once, forget, and save five figures a year.

Further reading

  • AWS S3 — Best practices for performance. The prefix-sharding and request-rate sections in particular.
  • "Building and Operating a Pretty Big Storage System (CIDR 2021)" — Andy Warfield. The story of S3 at scale, from one of the people who built it.
  • Adjacent: Object storage design. The S3-shape from the system-design angle.
  • Adjacent: CDN. The pattern that saves you from egress bills.
  • Adjacent: Cost engineering. Lifecycle policies, snapshot management, the cross-AZ trap in detail.
Found this useful?