DynamoDB.
A fully-managed key-value and document store with single-digit-millisecond reads at any scale you can throw at it — provided you've designed the partition key correctly. Get the key wrong and you'll hit a hot-partition wall. Get it right and the same table serves a thousand TPS and a million TPS without any architectural change.
1 · What DynamoDB actually is (and isn't)
DynamoDB descends from the original Dynamo paper — "Dynamo: Amazon's Highly Available Key-Value Store," published at SOSP 2007 by DeCandia et al. The paper's contribution was showing that an internet-scale store could trade strict consistency for availability and partition tolerance using consistent hashing, sloppy quorum, vector clocks for conflict resolution, and gossip-based membership. Cassandra's lineage runs through the same paper; the names that recur in distributed-systems discussions — sloppy quorum, hinted handoff, anti-entropy — were Dynamo's vocabulary first.
What AWS shipped as DynamoDB in 2012 shares the genetic material but rebuilt the surface entirely. Out went vector clocks and tunable consistency — too hard for application developers to reason about. In came a managed service with strict primary keys, automatic partitioning, eventually-consistent reads as the cheap default with optional strong consistency, secondary indexes, point-in-time recovery, streams. The 2022 USENIX ATC paper "Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service" by Elhemali et al. is now the canonical architectural reference — read it once if you're going to operate DynamoDB at scale.
The mental model that survives every conversation: DynamoDB is a distributed hash table where (table, partition_key) → set of items, ordered by sort_key, replicated across three AZs. Every operation maps cleanly onto that primitive. GetItem is a hash lookup. Query is a hash lookup followed by a range scan within one partition. Scan is a parallel sequential scan across every partition. PutItem writes to a quorum of replicas. If your access pattern doesn't map onto these primitives, it's likely the wrong tool.
What DynamoDB isn't: a SQL database. No joins (you denormalise instead). No arbitrary ad-hoc queries (you design tables around your queries, not the other way around). No transactions across partition keys without explicit, doubly-priced transaction APIs. No counters with strong consistency without compare-and-set loops. The DynamoDB mindset is hostile to relational habits; teams that bring SQL instincts often end up with a DynamoDB-shaped relational schema and DynamoDB-shaped bills with relational-shaped performance.
| DynamoDB is good for | DynamoDB is bad for |
|---|---|
| Predictable key-based access patterns at any scale | Ad-hoc analytical queries (use Athena, Redshift, or replicate to a warehouse) |
| Single-digit-ms reads and writes under any load | Workloads requiring SQL joins across tables |
| Spiky traffic — on-demand mode absorbs surges with no capacity planning | Wide-row analytics (BigQuery / ClickHouse are the right shape) |
| Multi-region active-active (Global Tables) | Items larger than 400 KB (move blobs to S3, store pointers) |
| Session stores, shopping carts, user profiles, leaderboards, IoT timeseries with bounded keys | Anything that needs full-text search (use OpenSearch) |
2 · How DynamoDB is built — partitions, replicas, the request router
The 2022 USENIX paper finally gave the public a clean picture of the architecture. Requests flow through a stateless request router, which authenticates the caller, looks up the table's metadata, hashes the partition key to a token, and finds the responsible storage node group. Each partition is replicated three ways — one leader, two followers — across three AZs.
A DynamoDB table stores items. Each item is a set of attributes. Every item has a primary key, which is either a single partition key (PK) or a composite partition key + sort key (PK + SK). The PK determines which physical partition the item lives on; the SK orders items within a partition.
The partition key is hashed to a 128-bit token; that token falls into a range owned by one storage group. Each group handles up to 3,000 RCU and 1,000 WCU at the published limits (with adaptive capacity bursting higher). A read costs 1 RCU per 4 KB for strongly-consistent, 0.5 RCU eventually-consistent. A write costs 1 WCU per 1 KB. Larger items cost proportionally more, rounded up.
You choose on-demand (pay per request, ~5× more expensive per request than provisioned) or provisioned (commit to a baseline, auto-scale around it). On-demand is right when traffic is unpredictable, spiky, or you're prototyping. Provisioned is right when traffic is steady and you've measured. The 2022 paper documents that on-demand is implemented as provisioned with continuous re-provisioning under the hood — you pay the premium for the autoscaler's responsiveness.
3 · Hot partitions and partition-key design
The single most important sentence about DynamoDB at scale: the partition key must be high-cardinality and uniformly accessed. If your access pattern concentrates on a few hot keys (a celebrity's user ID, today's date), one shard saturates while the rest sit idle. You'll see throttling that no amount of total provisioned capacity can fix.
- Bad:
pk = date_string— all today's writes go to one shard. - Bad:
pk = status— three values, three hot shards. - Better:
pk = entity_id(USER#123, ORDER#456) with high cardinality. - Suffix sharding for forced uniformity:
pk = date_string#0..Nspreads a hot day across N synthetic shards. Application has to query all N when reading. - Adaptive capacity automatically isolates and re-shards hot partitions over minutes — modern DynamoDB handles short-term spikes much better than 2017-vintage DynamoDB. Hot keys still hurt; you still want a good distribution.
USER#123, ORDER#456, USER#123#ORDER#456), then use GSI overloading to support multiple access patterns. Section 5 deepens this. Hard to set up, beautiful at scale.4 · Adaptive Capacity — how hot keys get rescued
Before 2018, hitting a hot partition meant your application throttled until you re-provisioned the table or — in the worst case — paused traffic and rebuilt the schema. The 2018-and-later Adaptive Capacity system, formalised and improved in the 2022 ATC paper, changes the failure mode. Hot partitions get split mid-flight; traffic gets re-routed; throttling alleviates within minutes rather than hours.
Two distinct mechanisms work together. First, capacity sharing — when a partition is hot but neighbours are cold, the control plane loans WCU/RCU budget from the cool partitions to the hot one. This is "free" capacity, available in seconds, and absorbs short bursts entirely. Second, physical splitting — when a partition has been hot for longer than the burst window, the control plane schedules a split that copies half the key range to a new storage group. The split takes minutes; traffic rebalances smoothly as routing tables update.
Adaptive Capacity doesn't make poor key design free. A partition key with a permanently skewed distribution (10% of writes always hit one key) will keep getting split until the key itself is a single partition — and at that point you're stuck at one partition's throughput ceiling. The "celebrity follower" problem in social apps is the canonical example: even infinite splits can't help one row that's hotter than 1,000 WCU/s.
5 · Single-table design — the Houlihan pattern
Rick Houlihan, who led DynamoDB's customer engineering for years, made the case at re:Invent 2017–2019 that the right DynamoDB schema for nearly any OLTP workload is one table, many entity types, overloaded keys. The argument is mechanical: every join in a relational schema becomes a separate round-trip in DynamoDB; every round-trip is at least one ms. A relational schema with five joins is 5+ ms; a single-table design that pre-joins the result via the partition key is 1 ms.
The trick is using the partition key and sort key as composable strings rather than meaningful values. A single table might hold:
| pk | sk | What it represents |
|---|---|---|
USER#123 | PROFILE | User 123's profile record |
USER#123 | ORDER#2026-05-01#abc | Order abc placed by user 123 on that date |
USER#123 | ORDER#2026-05-12#xyz | Another order — query pk=USER#123, sk begins_with ORDER# returns all |
ORDER#abc | METADATA | Order detail by order id |
ORDER#abc | ITEM#sku-1 | Line items in the order |
One Query call pk=USER#123, sk begins_with ORDER# returns the user's profile-adjacent data without a join. GSI overloading takes this further: a single GSI with key (gsi1pk, gsi1sk) can index many entity types — set gsi1pk = ORDER#abc on order items and you've built a reverse lookup; set gsi1pk = STATUS#shipped on orders to query by status. One physical index, many logical access patterns.
Single-table design is famously difficult to retrofit. The keys have to be designed around your access patterns up front; adding a new pattern later may require backfilling a GSI. Alex DeBrie's DynamoDB Book and Houlihan's re:Invent talks are the practical references. Most teams settle for a hybrid — single-table for the high-traffic core domain, separate tables for ancillary data — which is a pragmatic compromise.
6 · Indexes — GSI vs LSI
| LSI (Local Secondary Index) | GSI (Global Secondary Index) | |
|---|---|---|
| Partition key | Same as the base table | Different — your choice |
| Created | Only at table creation | Any time |
| Max per table | 5 | 20 (soft limit, raisable) |
| Consistency | Strongly consistent reads available | Eventually consistent only |
| Capacity | Shares with table | Separate provisioned capacity |
| Use | Different sort orders for same PK | Different access patterns entirely |
GSIs are almost always the right answer. They cost extra (each GSI item is essentially a duplicate write), but the flexibility of "I can add an access pattern later" beats LSI's "must decide at table creation" by a wide margin. A common pattern: one base table with PK=entity_id, then a GSI with PK=tenant_id to find all items per tenant.
7 · Streams, transactions, TTL
- DynamoDB Streams — change data capture. Every item-level change emits a record (new image / old image / both). Streams retain 24 hours; subscribers (Lambda is the usual one) get triggered with batches. The pattern for "when a user is created, send a welcome email" — Lambda subscribes to stream, filters for INSERT events on USER items.
- Transactions — DynamoDB supports ACID transactions across up to 100 items / 4 MB. Costs 2× the WCU/RCU of equivalent non-transactional operations. Use for "transfer balance from A to B" style operations; over-using transactions doubles the bill.
- TTL — set a Unix-epoch attribute on items; DynamoDB deletes them within ~48 hours of expiry. The latency is loose — don't rely on it for "deleted within 60 seconds" use cases. Free, doesn't consume WCU.
- DAX — DynamoDB Accelerator, an in-memory cache that fronts DynamoDB. Sub-millisecond cached reads. Useful for read-heavy workloads with hot keys; mostly obsolete now that adaptive capacity is better and ElastiCache exists.
8 · Real-world case studies
Three public stories show how DynamoDB shapes architecture at the high end of scale.
Pokémon GO — surviving the 2016 launch. Pokémon GO launched in July 2016 and within a week became one of the highest-traffic mobile games ever, eventually serving over 100 billion DynamoDB requests during peak launch periods. Niantic and AWS gave a joint re:Invent 2017 session — "How Pokémon Catches 'Em All with AWS" — and a follow-up case study describing how DynamoDB anchored the player-state backend. The numbers from the talk: traffic exceeded the team's most pessimistic launch forecast by ~50×; DynamoDB scaled by repeatedly increasing provisioned capacity through auto-scaling and capacity sharing with no architectural change. The lesson teams take from this is "DynamoDB doesn't have a scaling ceiling that matters" — true at this scale because Niantic's key design (player ID as the partition key) was uniformly hot, never concentrated.
Disney+ — the November 2019 launch on DynamoDB. Beyond CloudFront, Disney's launch architecture used DynamoDB extensively for user profile state, watch history, and entitlements. The re:Invent 2020 talk "How Disney+ uses AWS for streaming" describes a multi-table model with hot-shard avoidance via composite keys and aggressive use of DynamoDB Streams to fan changes into downstream caches. They mentioned single-digit-ms p99 reads sustained through the 10-million-subscriber launch surge — the durability advertised by the marketing turned out to be the durability they got.
Lyft — DynamoDB at scale for marketplace state. Lyft's engineering blog has documented their use of DynamoDB for various marketplace primitives. "Keeping Pace with DynamoDB" describes the operational discipline they built: a wrapper SDK that enforces a per-table latency budget, automatic alerts on consumed-capacity deviations, and a chaos testing regime that injects throttle errors during normal traffic to validate retry logic. The architectural detail worth borrowing: they treat DynamoDB as an explicit capacity-budget abstraction rather than an "infinitely scalable" black box — every team owns its own table, its own keys, its own throttle handling.
The through-line: DynamoDB scales when the partition key is uniformly hot, and breaks otherwise. Successful adopters share an obsession with key-design review and a habit of treating capacity as a real, observable resource rather than an autoscaling promise.
9 · Build it yourself — table, GSI, throttle observation
- Create a table with PK + SK.
aws dynamodb create-table --table-name lab-orders \ --attribute-definitions AttributeName=pk,AttributeType=S AttributeName=sk,AttributeType=S \ --key-schema AttributeName=pk,KeyType=HASH AttributeName=sk,KeyType=RANGE \ --billing-mode PROVISIONED \ --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 aws dynamodb wait table-exists --table-name lab-orders - Write 100 items spread across users.
for i in $(seq 1 100); do USER=$((RANDOM % 20)) aws dynamodb put-item --table-name lab-orders --item "{\"pk\":{\"S\":\"USER#$USER\"},\"sk\":{\"S\":\"ORDER#$i\"},\"total\":{\"N\":\"$((RANDOM % 1000))\"}}" done - Query all orders for one user (single-partition query — fast).
aws dynamodb query --table-name lab-orders \ --key-condition-expression "pk = :u" \ --expression-attribute-values '{":u":{"S":"USER#5"}}' - Cause throttling on purpose — saturate writes to one key.
# Provisioned 5 WCU; this loops hard and will start to throttle. for i in $(seq 1 200); do aws dynamodb put-item --table-name lab-orders \ --item "{\"pk\":{\"S\":\"USER#hot\"},\"sk\":{\"S\":\"ORDER#$i\"}}" \ --return-consumed-capacity TOTAL 2>&1 | grep -i throttl & done wait - Add a GSI by tenant.
aws dynamodb update-table --table-name lab-orders \ --attribute-definitions AttributeName=pk,AttributeType=S AttributeName=sk,AttributeType=S AttributeName=tenant,AttributeType=S \ --global-secondary-index-updates '[{ "Create": { "IndexName": "by-tenant", "KeySchema": [{"AttributeName":"tenant","KeyType":"HASH"},{"AttributeName":"sk","KeyType":"RANGE"}], "Projection": {"ProjectionType":"ALL"}, "ProvisionedThroughput": {"ReadCapacityUnits":5,"WriteCapacityUnits":5} }}]' - Enable streams to see CDC.
aws dynamodb update-table --table-name lab-orders \ --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES SHARD_ITER=$(aws dynamodbstreams describe-stream --stream-arn $(aws dynamodb describe-table --table-name lab-orders --query 'Table.LatestStreamArn' --output text) --query 'StreamDescription.Shards[0].ShardId' --output text) # In production a Lambda function subscribes to the stream; here we'd use GetShardIterator + GetRecords manually. - Tear down.
aws dynamodb delete-table --table-name lab-orders
10 · What breaks
- Hot partition throttling despite Adaptive Capacity. The single most common production issue. Adaptive Capacity smooths transient spikes but cannot save you from a permanently skewed access pattern — if 30% of writes target one key, you'll throttle no matter how many splits occur. Diagnose with the per-partition CloudWatch metrics (need
ContributorInsights) or the newerHotPartitionSummary. Fix is partition-key redesign — adding capacity doesn't help. - The 400 KB item-size hard limit. Hit it with documents containing big arrays or embedded blobs. The official cap is 400 KB; the practical recommended max is closer to 100 KB because larger items burn WCU disproportionately (every write costs ceil(size/1KB) WCU). Move blobs to S3, store the S3 key in DynamoDB. Some teams compress JSON with zstd to stay under the limit at the cost of CPU on every read.
- LSI's 10 GB partition-size limit (can't grow). Local Secondary Indexes constrain the entire partition to 10 GB total across base table + all LSIs. Once you hit it, no further writes succeed for that partition — and you can't drop the LSI because they're declared at table creation. The fix is "stop using LSIs and migrate to GSIs," which usually means rebuilding the table. Treat LSIs as a permanent decision; almost always reach for GSI instead.
- GSI eventual consistency. GSIs are only eventually consistent — you cannot do
--consistent-readagainst one. The replication lag is typically tens of milliseconds but can stretch to seconds under load. Don't read your own write through a GSI immediately and expect to see it; this is a common gotcha for teams expecting a SQL-shaped index. - Scan instead of Query.
Scanreads the whole table — fine for occasional admin work, ruinous for production. Always Query (by PK, with SK conditions) or use a GSI. - Eventually-consistent reads when you needed strong. Default read is eventually consistent (0.5 RCU/4KB). For "did my write land?" reads, pass
--consistent-readand pay 2× the RCU. Tables don't fail safely here: an eventually-consistent read after a write succeeds with stale data, no error. - BatchWriteItem partial failures. Up to 25 items per batch; some can succeed, some can throttle. The response returns
UnprocessedItems— retry them with exponential backoff. Forgetting to retry partial failures silently loses writes. - Switching from on-demand to provisioned has a 24-hour cooldown. Once you switch a table's billing mode, you cannot switch again for 24 hours. Plan migrations carefully; don't toggle modes during an incident.
- DynamoDB Streams 24-hour retention. Stream records are kept for exactly 24 hours. If your consumer falls behind that long, the missed records are gone forever — there's no replay. For longer retention, mirror Stream into Kinesis Data Streams (24-hour to 365-day retention) using the built-in integration.
- Transactions cost double.
TransactGetItemscosts 2× RCU;TransactWriteItemscosts 2× WCU. Wrapping every write in a transaction "for safety" doubles your bill for the same throughput. Use transactions only for multi-item changes that must be atomic (balance transfers, idempotent claim checks). - Reserved-words minefield. The list of DynamoDB reserved words is long and includes innocent-looking names like
name,status,data. Attribute expressions need placeholders for these. The errors are runtime, not parse-time — bad code looks fine until traffic hits it.
11 · Further reading
- Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service (USENIX ATC 2022). The canonical architecture paper. Read once.
- Dynamo: Amazon's Highly Available Key-Value Store (SOSP 2007). The original Dynamo paper. Vocabulary and ideas that DynamoDB inherited, even though the surface differs.
- The DynamoDB Book by Alex DeBrie. The canonical text for single-table design and access-pattern modelling.
- Rick Houlihan's re:Invent talks (2017–2019). The Houlihan canon on advanced design patterns. Worth watching at 1.5× speed.
- Niantic / Pokémon GO case study. The launch-scaling story.
- Lyft — Keeping Pace with DynamoDB. The operational-discipline post.
- Sharding. Why range/hash/consistent-hash sharding choices matter — DynamoDB picks consistent hashing for you.
- Cloud databases (concepts). When DynamoDB-shape fits the workload.