9 min read · Guide · Durability
How it works · Redis · Persistence

How Redis persists data: RDB, AOF, and the BGSAVE fork trick

Redis lives in RAM. Persistence is the answer to "what happens on crash or restart?" The choices — and the surprising amount of cleverness in each — are the whole story.

Dirty keys0since last RDB
AOF buffer0 Bpending fsync
Last saveRDB snapshot

AOF
Keyspace (in RAM) · 0 keys
— empty —
RDB file (on disk)
— no snapshot yet —
AOF buffer · everysec
pending0 cmds · 0 B
— buffer clean —
AOF file (fsync-ed)
appendonly.aof0 cmds · 0 B
last fsync
— file empty —
Clock Mode RDB + AOF=everysec

Redis is in-memory. Persistence is a backup story.

Reads and writes happen in RAM. Disk only matters on restart.

The Redis hot path never touches disk. Every GET walks an in-memory dict; every SET mutates an in-memory value and returns. That's the whole performance story — that's why a single Redis process handles hundreds of thousands of ops per second on a modest box. Persistence is what you do on the side so that a power cut doesn't erase the dataset.

The four choices, all explicit. Nothing — the cache case, when the dataset is regenerable from Postgres or DynamoDB and "lose all keys" is the same as "cold cache, refills on miss." Plenty of production Redis deployments persist nothing. RDB only — periodic binary snapshots; lose up to the snapshot interval on crash. AOF only — append every write to a command log; lose up to one fsync window. Both, the default since 4.0 — RDB for fast restart, AOF for crash durability.

Each has a specific failure profile, and that profile is the whole conversation. "How much data are you willing to lose if the kernel panics right now?" is a question with four different answers, one per configuration. The rest of this page is about the mechanics that make each answer hold up.


RDB snapshots and the BGSAVE fork trick.

SAVE blocks the server. Nobody uses SAVE.

The naive way to snapshot the keyspace is SAVE: the main thread walks the dict, serialises every key, writes the RDB file. While it does this, the server is unresponsive — single-threaded, remember. On a 10 GB instance that's tens of seconds. Nobody uses SAVE in production.

BGSAVE forks. The child process inherits the parent's address space and writes the snapshot from the inherited memory. The parent keeps serving commands. The trick that makes this cheap is copy-on-write: after fork, both processes share the same physical pages, marked read-only by the kernel. The parent only pays for a page when it writes to it — at which point the kernel duplicates that single 4 KB page and gives the parent a private copy. If the parent isn't writing much during the snapshot window, the COW cost is essentially zero.

If the parent is on a write-heavy workload during BGSAVE, COW cost approaches "duplicate the entire keyspace." That's the famous Redis fork latency spike — a 30 GB instance can stall for 100 ms to 1 s during the fork itself (the page-table copy), then bleed memory for the duration of the snapshot. Antirez wrote about this for years. The mitigations are unsexy and effective: schedule BGSAVE during low-write windows; disable Transparent Huge Pages on the host (2 MB pages make COW 512× more expensive per copy); run a dedicated replica that does the persistence so the master never forks.

The save 3600 1 300 100 60 10000 default triggers BGSAVE automatically — every hour if at least one key changed, every five minutes if 100 changed, every minute if 10,000 changed. Tune to your write rate and your acceptable loss window.


AOF: every write as a line of text.

An append-only file in the Redis serialisation protocol.

The AOF (append-only file) records every write command in the same RESP wire format the clients sent. SET user:1 Ada goes into the keyspace and into the AOF. On restart, Redis opens the AOF and replays it from the beginning, reconstructing the in-memory state by re-running each command. No magic, no recovery algorithm — just deterministic replay of the write stream.

The interesting knob is appendfsync, with three settings that span four orders of magnitude of durability:

always
fsync after every command. Sub-millisecond loss window. Throughput drops to roughly 30-50% of everysec — you're paying the per-write fsync cost (typically 0.5-2 ms on enterprise SSD with a supercap, much worse on consumer drives that fake fsync). Used when the data is irrecoverable and one write loss is a bug report.
everysec
Default. A background thread fsyncs the buffered writes once per second. Worst-case loss: about one second of writes on crash. Throughput is close to no-AOF, because the fsync isn't on the hot path. This is the right answer for almost every production deployment.
no
Never call fsync. Let the OS flush when it feels like it — usually every 30 s on Linux. Worst-case loss is whatever was in the page cache, which can be tens of seconds of writes. Fastest, useful when AOF exists for replication semantics but you don't actually care about durability.

The everysec sweet spot is real. The background fsync thread is one of the few cases in Redis where the single-threaded discipline bends — the main thread queues bytes; the AOF thread flushes. If the AOF thread is stuck because the disk is slow, the main thread will eventually block on the AOF buffer growing too large, which shows up as p99 latency spikes that look like nothing else. Watching INFO persistence for aof_delayed_fsync is one of those rare-but-critical Redis SRE muscles.


AOF rewrite: compacting the log while the server keeps serving

A million SETs on the same key is a million lines in the AOF.

The AOF grows without bound by default. A counter that increments once per request produces one AOF entry per request — terabytes per day at scale, even though the in-memory footprint is a single integer. The AOF would dominate the disk if no one ever compacted it.

BGREWRITEAOF is the cleanup. Redis forks a child (same fork trick, same COW behaviour, same caveats), and the child walks the current keyspace, writing a fresh AOF that reproduces the same final state with the minimum number of commands. A million increments collapse to one SET counter 1000000. The new file is typically a small fraction of the old one.

While the child is writing, the parent is still taking writes. Those go into both the live AOF buffer and a separate AOF rewrite buffer. When the child finishes its fresh AOF, the parent appends the rewrite-buffer contents, atomically renames the new file over the old one, deletes the old AOF. The handoff is invisible to clients. Trigger by default at auto-aof-rewrite-percentage 100 — when the AOF doubles in size compared to its post-last-rewrite size.

Redis 7.0 changed the on-disk shape: multi-part AOF. Instead of one growing file, the AOF is now a base file (an RDB-format snapshot) plus a sequence of incremental AOF files for commands after the base. The legacy aof-use-rdb-preamble toggle from 4.0 is gone — the new layout is simpler, more efficient, and makes the "RDB plus AOF tail" recovery path explicit on disk rather than implicit in one file header. The Redis 7.0 release notes have the migration details.


Replication adds another layer, and another way to lose data

The replica receives the command stream and applies it in order.

A Redis replica receives the master's write stream and applies it in order. If the master crashes, you fail over to a replica. This is durability against machine failure, not just process failure — and it's the layer that turns Redis into something you can run a business on.

Two persistence-relevant gotchas. The first is from the Jepsen analyses (Aphyr, 2013 and 2014): Sentinel-managed failover during a network partition can elect a replica that's behind on commands, and the writes the old master had accepted but not yet replicated are silently lost on failover. Redis added min-replicas-to-write as a mitigation — refuse writes if fewer than N replicas are connected — but it's an availability/durability trade, not a fix. Redis is not a strong-consistency database. If you need it to be, you need RedisRaft or you need a different system.

The second is the operational pattern that falls out of the persistence picture. Replicas can be configured with separate persistence settings: master with AOF off and even RDB disabled, replicas with AOF=everysec and periodic BGSAVE. The master never forks, never fsyncs, never pays the latency. The durability lives on the replicas, which are doing the disk work the master doesn't want to. Master writes are synchronous to RAM on the replicas (fast — sub-millisecond on a 10 Gb network) and asynchronous to disk on the replicas (durable, slower, off the critical path). At Twitter-scale Redis deployments this was the standard shape for years.


Redis 7+, Redis on Flash, and the persistence story at scale.

At scale, persistence is about how it composes with sharding and tiering.

Redis 7 brought the multi-part AOF and a steady stream of smaller persistence improvements — better fsync error reporting, smarter rewrite triggers, less COW pressure from data-structure choices. RedisRaft, in development for years at Redis Labs, would bring Raft-based strong consistency to a Redis cluster and replace the current best-effort failover with the same kind of guarantees etcd or Consul give. It's not generally available as of this writing; the design is on GitHub and worth reading regardless.

Redis Enterprise's Redis on Flash is a different model entirely. Hot keys stay in RAM; cold keys (and the values of less-active keys) live on SSD with only metadata in RAM. A 10 TB dataset on a host with 64 GB of RAM, served at single-digit-millisecond latency for the hot path and 100-500 µs for the cold path. The persistence story changes — the SSD layer is the durable layer, persistence is implicit in the storage hierarchy, and the RAM tier is essentially a cache. RDB and AOF still exist for the metadata, but the keyspace itself is on persistent storage by construction.

The pattern at serious scale isn't "tune RDB and AOF." It's: pick a per-shard persistence mode that matches the per-shard write rate, isolate the persistence work on replicas so the master never forks, pre-shard before you hit fork latency that hurts, run multi-AZ replicas with semi-sync writes for the durability your application actually needs. RDB and AOF are still the primitives, but the conversation has moved to how those primitives compose with everything around them.


A closing note

Persistence in Redis is a sequence of small, specific tricks — fork, COW, append, fsync-in-a-thread, multi-part AOF, replica-side durability — each in response to a real production problem that Antirez or someone close to him hit. None of it is magic. All of it is worth understanding before you reach for the defaults and assume they fit you.

Found this useful?