Kafka Log Compaction vs Time Retention: Two Storage Models in One Broker

December 28, 2025


Kafka ships with two completely different storage models in the same broker, and most outages on compacted topics come from teams who reached for the wrong one.

Time and size retention is the default. The broker rolls segment files at segment.bytes or segment.ms, and once a segment is older than retention.ms or pushes the partition past retention.bytes, the whole segment gets deleted. Records expire in time order regardless of what they say. This is the model you want for event streams: clickstreams, application logs, metrics, audit feeds. The producer keeps appending, old data falls off the tail, and consumers replay a recent window.

Log compaction is a different beast. Set cleanup.policy=compact and the broker periodically scans the partition with a background log cleaner. For each key, it keeps the most recent value and removes older versions. The active segment never gets touched, so producers never block. A tombstone is a record with a null value: the cleaner treats it as "delete this key," and once the cleaner processes it, the key is gone. This is the model behind Kafka Streams state stores, Kafka Connect configs, the internal __consumer_offsets topic, and any place you want a topic to behave like a changelog.

The two models answer different questions. Retention asks "what happened in the last N hours?" Compaction asks "what is the current value for this key?" If your consumer needs current state, replaying a compacted topic from offset zero rebuilds the whole keyspace. Replaying a retention topic from offset zero gives you whatever has not expired yet.

The production failure I have cleaned up: a team stored user-profile snapshots on a compacted topic and shipped deletes as tombstones. They never touched delete.retention.ms, which defaults to 86400000 (24 hours). This setting controls how long the cleaner keeps tombstones visible after compaction so that lagging consumers can still observe them. Their downstream consumer crashed on a Friday night, the on-call did not get paged until Monday morning, and by the time it came back up the tombstones had been compacted away. The consumer saw the pre-tombstone profile records on its replay, treated those users as active, and re-emitted them into the downstream identity service. Roughly 4000 users that had been deleted by the privacy team came back to life.

The fix has two parts. Raise delete.retention.ms to comfortably exceed your worst-case consumer downtime, a week is reasonable. For records that must be deleted, do not rely on tombstone propagation alone: emit a separate "hard delete" event on a retention topic that the downstream service treats as authoritative.

Mental model: events use retention, state uses compaction, and tombstones have their own clock.

Key takeaway

Retention expires events by time or size. Compaction expires older versions of the same key while keeping the latest. Tombstones plus `delete.retention.ms` decide how long deletes survive, and the default of 24 hours is shorter than most outage windows.

Originally posted on LinkedIn. View original.


All Rights Reserved.