Kafka
Consumer Manual
Offset Commit
Data Streaming
Distributed Systems

Kafka consumer manual offset commit

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. One of its fundamental components is the Kafka Consumer, which reads data from a Kafka cluster. A significant aspect of Kafka Consumers is how they manage offsets. An offset is a pointer to the last record that Kafka has already sent to a consumer in a specific partition of a topic.

Manual offset commit in Kafka consumers is a critical feature that allows developers to have precise control over when a record is considered consumed—that is, when the offset is updated. This is vital in scenarios where the processing might fail, and thus the application needs to reprocess the message.

Understanding Offsets and Consumer Groups

Kafka stores the offsets at which a consumer group has been reading. The offset commit is the act of updating the current position of the consumer's offset in the topic partition. Committing offset means telling Kafka that you've successfully processed all messages prior to that point and you do not need to process them again. Normally, offsets are committed automatically, but you can configure your Kafka consumer to commit offsets manually.

Manual Offset Control

You can control when and how offsets are committed in several ways when using Kafka API. The most common configuration for manual control is to disable the enable.auto.commit by setting it to false:

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("group.id", "test");
4props.put("enable.auto.commit", "false");
5props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
6props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
7KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

Manual Offset Commit Strategies

There are generally two strategies to manually commit offsets: synchronous and asynchronous.

1. Synchronous Commit (commitSync)

The commitSync method commits the last offset that you have fetched from the consumer. Here, the consumer will not return from the commitSync() call until the broker confirms the commit, or an unrecoverable error happens.

java
1try {
2  consumer.commitSync();
3} catch (CommitFailedException e) {
4  logger.error("Commit failed", e);
5}

2. Asynchronous Commit (commitAsync)

The commitAsync provides a non-blocking commitment alternative. It allows other operations to be performed while the commit operation is underway.

java
1consumer.commitAsync(new OffsetCommitCallback() {
2    public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
3        if (exception != null) {
4            logger.error("Commit failed for offsets {}", offsets, exception);
5        }
6    }
7});

Choosing Between Synchronous and Asynchronous

The choice between synchronous and asynchronous offset commits depends largely on your processing requirements and the behavior of your consumer under failure conditions:

  • Synchronous commits are simpler and ensure that offsets are committed before proceeding, but can lead to higher latencies.
  • Asynchronous commits improve throughput but can lead to more complex recovery scenarios, as there might be uncommitted messages if the consumer fails before the offset is successfully committed.

Best Practices for Manual Offset Commit

When committing offsets manually, it is crucial to consider the entire message processing workflow. Here are a few guidelines:

  • Commit Frequency: Commit too frequently, and you may harm performance; commit infrequently, and you may reprocess many messages in failure scenarios. Find a balanced approach based on your application's tolerance for duplicate messages.
  • Error Handling: Always handle exceptions during commitSync() or verify the result in the commitAsync() callback. This practice prevents data loss and ensures message consistency.

Summary Table

Commit TypeProsConsUse Case
commitSyncReliable, easy to manageHigher latency, blocking callLow-throughput, high-reliability scenarios
commitAsyncNon-blocking, higher throughputComplex error handling, risk of duplicatesHigh-throughput scenarios

By understanding and utilizing Kafka's manual offset commit capabilities, developers can significantly enhance the reliability and efficiency of their consumer applications.


Course illustration
Course illustration

All Rights Reserved.