How to force log compaction of a Kafka topic?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. One of Kafka's key features is log compaction, which ensures that a topic retains only the latest value for each key within the partition log. This is particularly useful for stateful services where it's essential to maintain an up-to-date state with minimal storage overhead. Here, we will explore how to force log compaction in Kafka and under what circumstances this may be necessary or beneficial.
Understanding Kafka Log Compaction
Log compaction in Kafka helps in maintaining only the latest update for each key in your Kafka topics. It is particularly useful for topics that act as a changelog of a database or a state store for a stream processing application. Under normal operations, Kafka stores all records - both new and updated. Over time, this can lead to inefficient storage usage and performance degradation.
When log compaction is enabled, Kafka will ensure that each key in your topic is only represented once in the log (with its most recent value), without changing the order of records. Deleted keys are represented with a "tombstone" event, which later leads to the removal of the key entirely upon compaction.
Prerequisites for Log Compaction
Before forcing log compaction, ensure that:
- The topic has
cleanup.policyset tocompact. - You have defined
min.compaction.lag.msanddelete.retention.ms, to control the minimum time a message will remain uncompacted and the retention of delete markers. - Key-based retention properties are configured as needed (like
segment.msandsegment.bytes).
Steps to Force Log Compaction
- Modify Topic Configuration: To initiate log compaction, start by setting or updating the topic to use the compaction feature.
- Adjust Compaction Settings: Configure settings to manage how aggressive the compaction process should be. This includes settings like
min.cleanable.dirty.ratio,segment.ms, andsegment.bytes.
- Monitor Compaction: You can monitor the compaction process through Kafka's JMX metrics under the
kafka.log:type=LogCleanerdomain.
Example Scenario: Forcing Log Compaction on a Topic
Here’s a practical example on forcing log compaction for a topic named app-status:
- Step 1: Configure log compaction cleanup policy
- Step 2: Configure additional settings for aggressive compaction:
Table: Key Configurations for Log Compaction
| Configuration Key | Description | Typical Value |
cleanup.policy | Compaction policy of the log (delete or compact) | compact |
min.cleanable.dirty.ratio | Ratio of log segments ready to be compacted to initiate cleaning | 0.3 |
segment.bytes | Size after which Kafka will check if the log needs compaction | 1073741824 (1GB) |
min.compaction.lag.ms | Minimum time a message will remain uncompacted | Variable |
When to Force Log Compaction?
Forcing log compaction might be necessary when:
- You change the key structure of messages and need to reclaim space efficiently.
- You're preparing a topic for a transformation in a system architecture which requires compact logs.
Conclusion
Log compaction is a vital feature in Apache Kafka for maintaining high-performance topics while ensuring that storage usage remains efficient over time. By understanding and utilizing the settings available, you can ensure your Kafka deployment remains efficient and tailored to your data retention needs.
For best results, regularly monitor the impact of these settings on your system’s performance and adjust as necessary to suit your specific requirements.

