Delete Messages from a Topic in Apache Kafka
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. It is designed to allow applications to publish and subscribe to streams of records in a fault-tolerant and durable manner. In many use cases, Kafka is employed as a kind of write-ahead log where data is durably recorded, allowing numerous consumers to read from it without impacting each other's read progress.
Understanding How Kafka Manages Data
Kafka stores streams of records in categories called topics. Data within topics is organized into partitions, where each partition is an ordered, immutable sequence of records. Records in a partition are assigned a sequential ID number called the offset which uniquely identifies each record within the partition.
Kafka does not allow for the direct deletion of a specific message once it is written to a topic. Instead, data is deleted using two primary mechanisms:
- Retention Policies: Records in a topic can be purged based on age or size limits set on the topic.
- Log Compaction: This feature allows Kafka to retain only the last value for each key within a log despite having many updates for that key.
Retention Policies
Retention policies in Kafka are configured at the topic-level through two main settings:
retention.ms: This dictates the maximum time period for which Kafka will retain records before they are eligible for deletion.retention.bytes: This limits the total size of logs that can be stored within each partition. When the size limit is reached, older records are deleted to make room for newer ones.
By default, these settings are managed at the broker level but can be overridden per topic. For example:
Log Compaction
Log compaction is a feature targeted at scenarios where the same key may be updated multiple times. Instead of retaining all records for a key, Kafka will compact the log to ensure that it only retains the latest update for each key. This is particularly useful for restoring state in systems like databases or cache layers.
Configuring log compaction involves setting the following properties on a topic:
cleanup.policy: Set this tocompactto enable log compaction.min.cleanable.dirty.ratio: This controls how compacted log segments can get before they are cleaned up.delete.retention.ms: Duration after which Kafka will delete the record's older versions post-compaction.
Example configuration:
Deleting Records Directly: Tombstone Messages
Kafka also allows for the direct deletion of a record by using a tombstone message. A tombstone is a special record with a key and a null value. When Kafka's log cleaner process encounters a tombstone, it will delete the key and any previous values associated with it. This allows for the effective deletion of records if using log compaction.
Example of producing a tombstone message:
Summary Table
| Term | Description |
retention.ms | Time after which data can be deleted from the topic. |
retention.bytes | Maximum size of the topic's data before older records are deleted. |
cleanup.policy | Policy for deleting or compacting old entries; can be delete or compact. |
min.cleanable.dirty.ratio | Ratio of dirty to clean entries that triggers cleaning in compacted logs. |
delete.retention.ms | Time to retain a delete tombstone before actual record deletion. |
Conclusion
While Kafka does not support the traditional 'delete' operation as seen in databases or other messaging systems, its robust data management strategies—retention policies, log compaction, and tombstone messages—provide flexible and powerful mechanisms for managing the lifecycle of data within the system. Understanding and configuring these features properly allows Kafka to be effectively integrated into various data management architectures, balancing performance, storage, and consistency according to the needs of the application.

