Kafka
Log Compaction
Message Consumption
Data Streaming
Tech Troubleshooting

Consuming again messages from kafka log compaction topic

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. One of its critical features is log compaction, which ensures that Kafka retains only the latest value for each key within a topic's partition for a specified time. This feature is particularly useful in scenarios where only the most recent state of data is necessary, such as in systems reflecting current status or configurations.

Understanding Log Compaction

Log compaction in Kafka helps in maintaining a smaller size of data by removing obsolete records. The key factor here is that each message in a Kafka topic has a key associated with it. Log compaction ensures that for a specific key, only the latest message is retained. Older messages with the same key are discarded during the compaction process.

How Log Compaction Works

Kafka's log compactor periodically reads through the log entries in a topic's partition and retains only the last message for each key. Here's an outline of how the process works:

  1. Continuous Compaction: Kafka continuously compacts the logs in the background, ensuring efficient use of storage and quick retrieval of the latest messages.
  2. Marker for Deletion: Messages are marked for deletion but actually removed later during the cleaner’s run.
  3. Handling Deletes: If a message with a key has a null value, it indicates a delete operation for the key, and the key is eventually removed from the log.

The Role of Compaction in Consumer Re-reading

Consumers in Kafka have the capability to read messages from a specific offset. For compacted topics, consumers can re-read messages at any time, but they'll only see the latest snapshot after compaction. Here's how you'd typically re-consume messages from a compacted log:

  • Set the Consumer to Earliest: Normally, consumers reading from a compacted topic should set their offset to the earliest. This action ensures that the consumer starts reading from the beginning of the compaction log.
java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("group.id", "test");
4props.put("enable.auto.commit", "true");
5props.put("auto.commit.interval.ms", "1000");
6props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
7props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
8props.put("auto.offset.reset", "earliest");
9
10KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
11consumer.subscribe(Arrays.asList("compacted_topic"));
  • Handling Offsets and Compaction: When a topic is compacted, if the consumer starts from a very old offset, some keys may not have their earlier state present since only the final state is retained.

Implications of Log Compaction on Consumer Behavior

When designing systems around Kafka's compacted topics, it’s crucial to understand how log compaction affects consumer behavior:

  • Latest State Assurance: Consumers are always assured of retrieving the latest state of each key.
  • Potential Data Loss: If your application logic requires a full history of all changes, log compaction alone won’t suffice. Additional strategies might be needed.

Example Scenario: Configuration Service

Consider a configuration service where each microservice's configuration is stored in Kafka as key-value pairs, with service identifiers as keys. Log compaction ensures that each service always retrieves the most recent configuration.

Summary of Key Points

FeatureDescriptionImportance in Compact Topics
Latest Value for Each KeyOnly the latest value for a key is retained.Ensures up-to-date data.
Efficient Storage UtilizationRemoves redundant data entries.Reduces storage overhead.
Consumer Re-readsConsumers can re-read from an earliest offset safely.Facilitates durable state reads.

Conclusion

Kafka’s log compaction feature offers a unique advantage for applications that require consistent and up-to-date states without the operational overhead of maintaining full data history. By efficiently handling state changes and updates, Kafka becomes a foundational component in modern data architectures, providing both scalability and reliability. When integrating with systems like CRDTs or event-sourcing, Kafka’s compaction feature enhances your system’s ability to maintain correct states across distributed environments.


Course illustration
Course illustration

All Rights Reserved.