Kafka Consumer
Auto Commit
Data Processing
Stream Processing
Message Queuing

How does kafka consumer auto commit work?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. One of its core components is the Kafka consumer, which reads records from a Kafka cluster. Efficient message consumption and offset management are crucial for correct message processing, which is where Kafka's offset commit mechanism, particularly the auto-commit feature, plays a vital role.

Understanding Offsets and Consumers

In Kafka, every message in a partition has a sequential ID number called an offset. The Kafka consumer tracks the highest offset it has consumed in each partition and may commit this offset back to Kafka. Doing so signifies that the consumer has successfully processed all messages up to that offset.

Auto Commit Explained

The auto-commit feature simplifies offset management by periodically committing offsets automatically. If enabled, this feature commits the offsets of messages a consumer has fetched from Kafka at regular intervals specified by auto.commit.interval.ms. This is a configuration property of Kafka's consumer API.

Configuring Auto Commit

To utilize the auto-commit feature, you must enable it in your consumer configuration:

properties
enable.auto.commit=true
auto.commit.interval.ms=5000  # Auto commit interval in milliseconds

Here, enable.auto.commit is set to true to activate auto commit, and auto.commit.interval.ms sets the interval at which commits are sent to Kafka. By default, it's every 5 seconds.

How Auto Commit Works Internally

When auto commit is enabled, each consumer runs a periodic task that commits the latest offset retrieved by the consumer fetch operations. The committed offsets are stored in a special Kafka topic named __consumer_offsets. Upon failure or restart, the consumer can resume reading from the last committed offset, ensuring no messages are reprocessed or missed.

Considerations and Caveats of Auto Commit

While convenient, auto-committing has certain shortcomings:

  • Potential Duplicates on Failure: If a consumer processes a message but fails before the next commit interval, it might reprocess the same message after recovery, leading to duplicate processing.
  • Delayed Commit: There can be a latency between message processing and the next auto-commit interval. If a failure occurs during this period, reprocessing of some messages might happen.

Additionally, the auto-commit mechanism might not be suitable for scenarios needing very precise control over when messages are considered "consumed."

Manual Offset Control

For greater control, consumer applications can manually commit offsets. Manual committing can be done synchronously (commitSync) or asynchronously (commitAsync) relative to message processing, thus allowing an application to ensure that messages are committed only after they have been fully processed and output operations completed.

Example of Auto Commit

The following Java example demonstrates setting up a Kafka consumer with auto-commit enabled:

java
1import org.apache.kafka.clients.consumer.KafkaConsumer;
2import org.apache.kafka.clients.consumer.ConsumerRecords;
3import java.util.Properties;
4import java.util.Collections;
5
6public class SimpleConsumer {
7    public static void main(String[] args) {
8        Properties props = new Properties();
9        props.put("bootstrap.servers", "localhost:9092");
10        props.put("group.id", "test-group");
11        props.put("enable.auto.commit", "true");
12        props.put("auto.commit.interval.ms", "1000");
13        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
14        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
15
16        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
17        consumer.subscribe(Collections.singletonList("test-topic"));
18        try {
19            while (true) {
20                ConsumerRecords<String, String> records = consumer.poll(100);
21                records.forEach(record -> {
22                    System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
23                });
24            }
25        } finally {
26            consumer.close();
27        }
28    }
29}

Summary Table

Configuration PropertyDescriptionDefault Value
enable.auto.commitWhether auto committing is enabled.true
auto.commit.interval.msInterval (in ms) between auto commits.5000

Conclusion

Kafka's auto-commit feature offers a straightforward yet powerful approach to manage offset commitments, useful in scenarios that can tolerate minor duplicates due to failures or restarts. For applications requiring exact processing semantics, manual offset control should be considered.


Course illustration
Course illustration