Redis Durability: What You Actually Lose on Crash
January 12, 2026
Redis lives in memory. That is the source of its speed and the source of every durability surprise. The disk and the replicas are bolted on, and each option trades performance for safety. "Persistent" is not a yes-or-no answer. It is a configuration with a loss window measured in writes.
No persistence is a valid mode. Treat the instance as a cache, accept that a crash erases everything, and rebuild from the source of truth on restart. Fastest, simplest, and right when the data is derived.
RDB snapshots are point-in-time dumps. Redis forks, walks the keyspace, and writes a binary dump.rdb to disk on a schedule. Between snapshots, every write lives only in memory. A crash loses everything since the last successful snapshot. Snapshot every five minutes and you can lose five minutes of writes. RDB is cheap at runtime and restores quickly. It is the wrong shape for user state.
AOF, the append-only file, logs every write command. On restart, Redis replays the log. Durability depends on the fsync policy.
always: fsync after every write. Minimal loss. Throughput tanks because every command waits on a disk sync.everysec: fsync once per second. The default. Loss window is roughly one second of writes on crash.no: let the kernel decide. Fastest, but loss can be tens of seconds if dirty pages sit in the page cache.
AOF files grow unbounded. Redis rewrites them in the background to a compact form, then swaps. The rewrite is its own hazard. It forks the process, and on a large dataset can briefly double memory and trigger an OOM kill if you sized the host tight. Watch used_memory_rss during rewrites the first time you turn AOF on.
Replication is not durability. A leader streams writes to one or more replicas asynchronously. Replicas lag, sometimes by milliseconds, sometimes by seconds under load. If the leader crashes and a replica is promoted, every write that had not yet shipped is lost. The system stayed up, the data is gone. WAIT N T blocks until a write reaches N replicas within T milliseconds, which tightens the window at the cost of latency, but it is not consensus and does not survive a network partition.
The failure mode that catches teams: assuming a sentinel or cluster failover preserves recent writes. It does not. Under sustained replication lag, a failover loses exactly as much as the replica was behind, silently.
A useful mental model. RDB answers "how much can I lose on crash." AOF answers "how fresh must data be after restart." Replication answers "can I stay online during failure." Three different questions. The right setup usually combines AOF for durability with replication for availability, and treats Redis as the wrong tool when you need both zero loss and zero downtime.
Redis durability is a knob, not a guarantee. RDB snapshots lose everything since the last dump. AOF with everysec fsync loses about a second. Replication protects availability, not data, because lag means recent writes can vanish on failover. Pick the mode that matches what you can afford to lose and stop assuming persistent means lossless.
Originally posted on LinkedIn. View original.