10 / 16

Cloud Codex · AWS / 10

Aurora & RDS.

Two managed-database services with the same surface (MySQL/PostgreSQL) but radically different internals. RDS is "vanilla Postgres on EBS that AWS patches for you." Aurora is a rewrite of the storage layer that decouples compute from storage and ships only the redo log to a six-way replicated quorum store. The architecture difference is why Aurora handles failover in <30s and RDS Multi-AZ takes 60-120s.

1 · What Aurora actually is (and isn't)

Verbitski et al.'s SIGMOD 2017 paper "Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases" is the canonical reference, and the central insight is one sentence long: ship only the redo log to storage, not the data pages. Every implementation decision in Aurora falls out of that choice.

Vanilla MySQL or PostgreSQL writes everything twice. The redo log (the WAL in Postgres, the InnoDB log in MySQL) records what changed. The data pages — 8 KB or 16 KB blocks containing the actual tuples — get written separately when the buffer cache flushes. Both writes go to local disk. Replicating to a standby means shipping both the log and the page-level changes across the wire, because the standby's buffer cache state is different from the primary's.

Aurora replaces the local disk with the Aurora storage volume — a distributed log-structured storage service. The writer ships only the redo log; the storage nodes materialise data pages from the log records on their own time. Six copies of the log live across three AZs (two per AZ). A write acks when four of six copies have durably persisted the log record; reads check a quorum of three. The data pages never cross the network as a coherent object — they're a derived view computed locally on the storage node when needed.

The mental model that survives every conversation: the Aurora storage volume is a giant distributed redo-log replay machine that happens to expose a Postgres or MySQL wire protocol. Readers are just additional compute instances pointing at the same volume; failover is "elect a new writer for the same volume"; cross-region replication is "ship log records to a remote volume." Every Aurora capability — fast failover, instant reader attach, branch backups, blue/green deployments, point-in-time recovery — derives from the log-only architecture.

What Aurora isn't: a new database engine. Aurora MySQL is the MySQL 8.0 SQL surface (with a few features missing); Aurora PostgreSQL is real PostgreSQL with extensions. The query planner, lock manager, isolation semantics — all unchanged. The reason Aurora can claim "drop-in compatible" is that it really is drop-in at the SQL layer.

Aurora is good for	Aurora is bad for
OLTP workloads with read-heavy traffic patterns (1 writer + many readers)	Multi-master active-active writes (Global Database has one writer)
Postgres / MySQL workloads needing sub-30s failover	Oracle, SQL Server, MariaDB workloads (use RDS instead)
Workloads that hit IOPS ceilings on RDS — Aurora's storage is throughput-tuned	Tiny single-instance dev databases (Aurora's baseline cost is higher than RDS)
Frequent schema cloning / branching for staging environments (clone is metadata-only)	Workloads requiring sub-millisecond write commit (you're still paying for 4/6 quorum across AZs)
Multi-region read-replica workloads via Aurora Global	Workloads where data residency forbids storage redundancy across AZs

2 · RDS vs Aurora — what's different

	RDS (Postgres/MySQL/MariaDB/Oracle/SQL Server)	Aurora (Postgres-compatible / MySQL-compatible)
Storage	EBS attached to the instance	Shared "Aurora storage" — 6 copies across 3 AZs, log-structured
Replication	Engine-native (logical or physical) to read replicas	Storage-layer replication; readers attach to the same storage volume
Replica lag	10s–minutes typical	Tens of milliseconds, often single-digit
Failover (Multi-AZ)	60–120s (DNS flip + recovery)	< 30s typically
Storage size	Fixed, must pre-provision and re-grow	Auto-grows to 128 TB
Backups	Snapshot-based; restore can take hours	Continuous to S3; PITR to any second in retention window
Engines supported	All five major	Postgres & MySQL only
Price	Cheaper baseline	~20% more, especially at low scale; cost-effective at scale because of decoupled compute/storage
Reach for it when	Oracle, SQL Server, MariaDB; small workloads	Postgres / MySQL at scale; need fast failover

3 · Aurora's "log is the database" architecture

Vanilla Postgres writes both the WAL (redo log) and the dirty data pages to local disk. Aurora ships only the WAL records to a distributed storage service; the data pages are reconstructed from the log on the storage nodes themselves.

Four properties fall out of the architecture:

Lower write amplification. A 16 KB page change ships ~hundreds of bytes of WAL, not the whole page. Aurora's published numbers show roughly 1/35th the network traffic of a comparable MySQL replication setup.
Storage scales independently of compute. Readers and writer share one logical "Aurora volume" — adding a reader instance doesn't copy data, it just attaches. A new reader is serving traffic within ~1 minute.
6-way replication, 4/6 write quorum, 3/6 read quorum. Two copies per AZ across 3 AZs. Loses one full AZ plus one additional storage node and the cluster still writes.
Crash recovery is fast. The storage layer continuously applies the log to materialise pages, so on writer crash the storage volume is already up-to-date through the last durable log record. A new writer attaches and starts serving within seconds — no log replay phase.

4 · The write path — INSERT to ack, step by step

Walking the write path makes the quorum mechanics concrete:

Two architectural details matter. First, the writer's buffer cache holds modified pages but those pages are not shipped to storage — only the redo records are. The storage nodes will materialise the page from the redo log when they need to read it. Second, the commit ack happens after 4 of 6 nodes have durably persisted the redo records. The remaining 2 nodes catch up via background gossip; from the client's perspective, the write is durable as soon as quorum is reached, which is typically a couple of milliseconds.

Read path mechanics are inverted. The writer's buffer cache serves hot reads locally. For pages not in cache, the writer sends a page-read request to the storage layer with a target log-sequence-number (LSN); the storage node materialises the page up to that LSN and returns it. Readers do the same against their own buffer cache + the shared storage. Readers see a slightly older LSN than the writer — that's where the "tens of milliseconds of replica lag" number comes from.

5 · Aurora MySQL vs RDS MySQL vs RDS Postgres — what's the same, what's different

	RDS MySQL	Aurora MySQL	RDS Postgres
SQL surface	Stock MySQL 8.0	MySQL 8.0-compatible (a few features missing, a few extras)	Stock PostgreSQL with extensions (pgvector, PostGIS, etc.)
Storage	EBS gp3 / io2	Aurora distributed storage (6×, 3 AZs)	EBS gp3 / io2
Replication	Binlog-based async (10s+ lag typical)	Storage-layer (tens of ms typical)	WAL streaming (seconds typical)
IOPS billing	Pay provisioned IOPS	Pay per I/O operation (or I/O-optimised flat rate)	Pay provisioned IOPS
Multi-AZ semantics	One sync standby (no read traffic)	Cluster spans 3 AZs by default; readers can serve reads	One sync standby (no read traffic)
Failover time	60–120s	< 30s typically	60–120s
Max storage	64 TB (gp3) / 64 TB (io2)	Auto-grow to 128 TB	64 TB
Read replicas	Up to 15 (async, separate storage)	Up to 15 (shared storage, fast attach)	Up to 15 (async, separate storage)
Backup model	Daily snapshot + binlog	Continuous to S3, PITR to any second	Daily snapshot + WAL
Best for	Small/medium MySQL apps, simple needs	MySQL at scale, fast failover required	Postgres apps wanting full ecosystem (extensions, FDW)

The same engine, different storage. The choice frequently comes down to: is your workload bound on IOPS / failover speed / read-fanout (pick Aurora) or are you using engine features that only RDS supports / running at a scale where Aurora's premium isn't justified (pick RDS)? Aurora's I/O billing surprises teams who don't model it — at high I/O rates, you can pay more for I/O than for compute. Aurora's I/O-Optimised mode (flat-rate pricing) is the answer for write-heavy workloads.

6 · Endpoints, readers, failover

An Aurora cluster has four kinds of endpoints. Hit the right one:

Endpoint	Routes to	Use
Cluster (writer) endpoint	Current writer; auto-updates on failover	All writes; reads that must be strongly consistent
Reader endpoint	Round-robin across reader instances	All read-only traffic (analytics, dashboards, app reads that tolerate ~10 ms lag)
Custom endpoint	A named subset of readers	"send these dashboards to `analytics-reader` group" — isolate slow queries
Instance endpoint	One specific node	Debugging, admin

On failover, the writer endpoint stays the same hostname but resolves to a new instance. The app's connection pool sees TCP RSTs; it has to reconnect. Most production setups use RDS Proxy in between to absorb the reconnect storm.

7 · RDS Proxy — connection pooling, pinning, and why it matters

Postgres and MySQL connections are expensive to set up and bounded by the instance size (Aurora's max_connections grows with memory — roughly 16,000 on a 32 GiB instance, ~5,000 on 8 GiB). Lambda's "every cold start opens a connection" pattern famously saturates Aurora — 1,000 concurrent Lambdas × open-connection-per-invoke = a dead database. Microservices that don't pool connections aggressively hit the same wall at a slightly higher concurrency.

RDS Proxy is a managed connection pooler that sits between clients and the cluster. It maintains a warm pool of backend connections to the writer (and optionally readers), accepts many more client connections than the backend can support, and multiplexes — one backend connection serves multiple clients over time, picking up the next query from whichever client has one ready. Failover is invisible: clients hold their session to the Proxy, which transparently re-routes to the new writer.

The critical mechanic to understand is session pinning. RDS Proxy can only multiplex when each query is fully self-contained — the client must not depend on backend session state outliving a single query. The moment the client does something that establishes session state, the Proxy pins the backend connection to that client until the session state is cleared or the client disconnects. Pinned connections behave as a 1:1 direct connection, not a pooled one, and burn pool capacity.

Multiplexes (no pinning)	Pins (must stay assigned)
Simple `SELECT` / `INSERT` / `UPDATE` / `DELETE`	Any explicit `BEGIN ... COMMIT` transaction
Single-statement transactions (autocommit)	Prepared statements via `PREPARE ... EXECUTE` (MySQL)
Stateless connection strings (no `SET`)	Postgres extended-protocol `Parse`/`Bind`/`Execute` from many drivers
Read-only queries (when using Postgres reader endpoint)	Temporary tables (`CREATE TEMPORARY TABLE`)
	Session variables (`SET`, `SET LOCAL`)
	Advisory locks (`pg_advisory_lock`)
	`LOCK TABLES`, role changes, user-defined variables

Most ORM-driven workloads pin aggressively without realising it. Hibernate's default uses Postgres extended-protocol — every query pins. node-postgres in transaction mode pins. Even the aws-sdk for RDS Data API doesn't help here (it bypasses Proxy entirely). The fix is one of: switch to simple-protocol drivers, disable transactions for read paths, use shorter sessions, or move to SimpleStatement mode in your ORM. AWS's pinning documentation lists every trigger; treat the list as a constraint on your application code, not a footnote.

The pinning audit. RDS Proxy exposes DatabaseConnectionsCurrentlyInTransaction and DatabaseConnectionsCurrentlySessionPinned in CloudWatch. If pinned connections are a meaningful fraction of your pool, you've turned the proxy into an expensive TCP forwarder. Lower-level fix: pgbouncer in transaction-pooling mode rejects the same statements but lets you see exactly why.

8 · Aurora Serverless v2 and Aurora Global

Aurora Serverless v2. Capacity expressed in ACUs (Aurora Capacity Units, each ~2 GiB RAM + proportional CPU/IO). Set min and max; the service scales the instance up and down in roughly 15-second increments. Charges per ACU-second. Good for variable workloads; not great if you need predictable low-latency tail (scaling events occasionally interrupt connections).
Aurora Global. A primary cluster in one region with up to 5 read-only secondary clusters in other regions. Cross-region replication lag < 1 second typically. Promote a secondary to primary in < 1 minute for regional DR. The serious multi-region story for Postgres / MySQL workloads.
Aurora DSQL (preview as of late 2025). AWS's serverless, multi-region, strongly-consistent Postgres-compatible DB. The pitch is "Spanner for AWS." Watch this space — it changes the multi-region database conversation when it goes GA.
Blue/green deployments. Aurora supports a single-click "blue/green" — clones the cluster to a green copy, lets you upgrade or change parameters on green, then switches over with a single DNS flip. The pattern for low-risk major-version upgrades.

9 · Real-world case studies

Three public stories show how Aurora actually shapes systems at production scale.

FINRA — regulatory analytics on Aurora at multi-petabyte scale. The Financial Industry Regulatory Authority ingests and analyses every U.S. equity market transaction — tens of billions of events daily. Their published case study and re:Invent sessions describe a hybrid architecture where Aurora hosts the operational and analytical metadata while the bulk event data lives in S3 / Glue. FINRA's re:Invent talks emphasised that Aurora's storage independence let them grow data volume by orders of magnitude without re-architecting; the I/O profile of regulatory queries (heavy reads against a small set of reference tables) maps cleanly onto Aurora's shared-storage / many-readers model.

Airbnb — MySQL to Aurora migration. Airbnb ran one of the largest publicly-documented vanilla-MySQL fleets for years and described their migration to Aurora MySQL in Airbnb Engineering posts and at re:Invent. The motivations were concrete: failover times under MySQL Multi-AZ were causing user-facing outages of 60–120s during planned and unplanned events; replication lag on read replicas was unpredictable under heavy write load (vacuum-equivalents in MySQL InnoDB); and they were hitting IOPS ceilings on EBS. The migration narrative — careful per-table testing, dual-writes during cutover, monitoring of pinning behaviour through RDS Proxy — became a template that many engineering teams have referenced when planning their own moves. The reported wins: sub-30s failovers, single-digit-ms replica lag, removal of an entire layer of bespoke MySQL operational tooling.

Samsung — consumer-scale customer data on Aurora. Samsung's account services have appeared in AWS re:Invent talks and customer references describing Aurora-backed user identity and entitlement systems supporting hundreds of millions of devices. The architectural pattern in the public material: many Aurora clusters sharded by user-id-range, each cluster handling a slice of the user population, with Aurora Global providing cross-region read replicas for latency-sensitive lookups during device boot. The technical interest is in the sharding management — Aurora's storage-layer replication makes vertical scaling within a shard easy (just add reader instances), but it doesn't shard across writers. Samsung's approach (application-layer sharding by user-id range with a routing service) is the standard pattern for "Aurora is great, but I need horizontal write scale" — the architecture mirrors what other consumer-scale companies have built on Aurora.

The through-line: Aurora's value comes from decoupling storage from compute, which translates into specific operational wins (fast failover, instant readers, predictable replication lag). At very high scale, you eventually shard at the application layer — but the per-shard story remains "lean on Aurora's storage and let the SQL engine be the SQL engine."

10 · Build it yourself — Aurora cluster, failover

Create a DB subnet group (needs 2 subnets in different AZs).
aws rds create-db-subnet-group --db-subnet-group-name lab-sng \ --subnet-ids subnet-aaa subnet-bbb \ --db-subnet-group-description "lab"
Create the Aurora cluster (writer only).
aws rds create-db-cluster --db-cluster-identifier lab-aurora \ --engine aurora-postgresql --engine-version 15.5 \ --master-username postgres --master-user-password ChangeMe123 \ --db-subnet-group-name lab-sng \ --vpc-security-group-ids sg-xxx aws rds create-db-instance --db-instance-identifier lab-aurora-writer \ --db-cluster-identifier lab-aurora --engine aurora-postgresql \ --db-instance-class db.t3.medium aws rds wait db-instance-available --db-instance-identifier lab-aurora-writer
Add a reader replica.
aws rds create-db-instance --db-instance-identifier lab-aurora-reader-1 \ --db-cluster-identifier lab-aurora --engine aurora-postgresql \ --db-instance-class db.t3.medium aws rds wait db-instance-available --db-instance-identifier lab-aurora-reader-1
Connect and write some data.
WRITER=$(aws rds describe-db-clusters --db-cluster-identifier lab-aurora --query 'DBClusters[0].Endpoint' --output text) PGPASSWORD=ChangeMe123 psql -h $WRITER -U postgres -c "CREATE TABLE t(id int, ts timestamptz default now())" PGPASSWORD=ChangeMe123 psql -h $WRITER -U postgres -c "INSERT INTO t(id) SELECT generate_series(1,1000)"
Trigger failover.
aws rds failover-db-cluster --db-cluster-identifier lab-aurora # Watch: while true; do aws rds describe-db-instances --db-instance-identifier lab-aurora-writer --query 'DBInstances[0].DBInstanceStatus' --output text sleep 2 done # Typically <30s end-to-end. Old writer becomes reader; old reader becomes writer.
Tear down.
aws rds delete-db-instance --db-instance-identifier lab-aurora-reader-1 --skip-final-snapshot aws rds delete-db-instance --db-instance-identifier lab-aurora-writer --skip-final-snapshot aws rds wait db-instance-deleted --db-instance-identifier lab-aurora-writer aws rds delete-db-cluster --db-cluster-identifier lab-aurora --skip-final-snapshot aws rds delete-db-subnet-group --db-subnet-group-name lab-sng

11 · What breaks

Failover takes 30–60s even with multi-AZ. Aurora's failover is fast by relational-DB standards, but it isn't zero. The writer endpoint's DNS record flips to the new writer (~5–15s), in-flight connections receive TCP RSTs, and the application's connection pool has to reconnect (often after a retry timeout). End-to-end user-visible failover is typically 30–60s. RDS Proxy reduces this dramatically by absorbing reconnections; without Proxy, expect 60s+ of partial unavailability.
Connection limit exhausted. Aurora's max_connections is bounded by instance memory. Lambda + no pooler = death by 1k connections. Use RDS Proxy or pgbouncer; budget connections like a real resource.
Reader endpoint round-robins blindly. The cluster-ro endpoint resolves to one of the reader instances at DNS resolution time, then your driver caches it. If one reader is saturated, the others can be idle — DNS round-robin doesn't see connection state. Use connection-pool-aware routing (PgBouncer with multiple backends, or custom HAProxy) for real load balancing.
Reader lag spikes. Long-running readers occasionally lag tens of seconds — usually under heavy write load with vacuum running, or when a reader is replaying a large transaction. Pin recent writes to the writer endpoint if your application can't tolerate stale reads.
"My major version upgrade took the database down for an hour." Use blue/green deployments — clone, upgrade green, swap. Worst case < 1 minute of downtime. In-place major version upgrades hold the cluster offline through the upgrade; never use in production.
RDS Multi-AZ ≠ Aurora Multi-AZ. RDS Multi-AZ is a synchronous standby in another AZ that becomes primary on failure (60–120s). Aurora distributes across AZs by default and has its own faster mechanism. Same name, different semantics — don't assume Aurora docs apply to RDS or vice versa.
Aurora Serverless v1 cold-start was minutes. v1 (deprecated; not available for new clusters) paused entirely when idle and could take 30–60s to wake up; if it was scaled down significantly, the first query after a quiet period would wait for capacity ramp. v2 fixed the pause behaviour but still has scale-up latency — adding ACUs takes ~15 seconds per increment. Latency-sensitive workloads should set a non-zero minimum capacity.
Aurora Global Database write-region failover is manual. Cross-region replication is automatic and fast (< 1s typical lag), but promoting a secondary region to primary is an operator-initiated action. There's no automatic regional failover the way there is automatic AZ failover. Plan and rehearse this — it's a runbook, not a service feature.
Aurora storage is billed by I/O for the standard tier. A surprise on write-heavy workloads: I/O charges can exceed compute charges. The Aurora I/O-Optimised configuration flat-rates I/O for a higher hourly base; model both before committing. The break-even is roughly "if I/O is >25% of bill, switch to I/O-Optimised."
Blue/green deployments don't replicate everything. Some Postgres extensions and replication slots don't carry across. Read the AWS blue/green doc list of supported configurations before committing the strategy to a production upgrade.
Backups don't include parameter groups or instance settings. A restore-from-snapshot creates an instance with default parameter group. If you've tuned shared_buffers, work_mem, or extension settings, recreate them explicitly — the restored instance won't have them.

12 · Further reading

Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases (SIGMOD 2017). The architecture paper. Worth one read.
Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes (SIGMOD 2018). Follow-up paper detailing how Aurora avoids 2PC for membership changes.
Aurora user guide. Especially the storage and high-availability chapters.
RDS Proxy pinning reference. The list of statements that pin connections — read once, treat as application constraint.
FINRA on Aurora. The regulatory-analytics case study.
Airbnb Engineering. Search for "Aurora" or "MySQL migration" — multiple posts on the cutover.
Write-ahead logging. Why Aurora can be log-only and still safe.
Cloud databases (concepts). When Aurora is the right shape vs DynamoDB / others.

Lambda execution →

Cold starts, layers, SnapStart, concurrency, provisioned concurrency, VPC attach cost — what's actually happening under aws lambda invoke.

Read Lambda execution

Found this useful?