Apache Kafka
Autocommit
No Commit
Distributed Systems
Data Streaming

Apache Kafka - Autocommit = false and no commit

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka, a distributed event streaming platform, facilitates high-throughput messaging with durability and fault-tolerance capabilities. Central to Kafka's consumer functionality are its offset management strategies, which are crucial for effectively processing messages retained in Kafka topics. One of the key configurations in offset management is choosing between automatic and manual offset commits.

Understanding Offsets in Kafka

In Kafka, an offset is a way to track the position of the consumer in the log of messages in a Kafka topic. Whenever a Kafka consumer reads messages, it either automatically commit the offsets back to Kafka (to signify message consumption), or the developer can manually handle this process. The parameter that controls this is enable.auto.commit—setting it to true automatically commits offsets at intervals defined by auto.commit.interval.ms. Conversely, setting it to false requires explicit handling of offset commits by the developer.

Auto Commit Configurations: enable.auto.commit = false

When enable.auto.commit is set to false, the responsibility of offset commits lies with the developer. This manual handling provides greater control over when a message is considered as 'consumed'. It is crucial in ensuring exactly-once and at-least-once processing semantics are maintained, preventing data loss or message duplications in certain scenarios.

Example Scenario

Consider a Kafka consumer application processing messages that update a database. If the database update and the offset commit are not atomic and the consumer fails right after updating the database but before committing the offset, the message will be read and processed again after the consumer recovers, leading to potential data inconsistency.

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("group.id", "test-group");
4props.put("enable.auto.commit", "false");
5props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
6props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
7try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
8    consumer.subscribe(Arrays.asList("topic"));
9    while (true) {
10        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
11        for (ConsumerRecord<String, String> record : records) {
12            processRecord(record);
13            consumer.commitSync(); // manually committing the offset each record
14        }
15    }
16}

Note that committing offsets manually (commitSync() or commitAsync() methods) offers the benefit of retrying the processing of messages in case of process failures, and provides stronger control over record processing, allowing reprocessing of records if needed.

No Commit Strategy

A more daring approach is to refrain from committing offsets at all. This means neither Kafka nor the consumer will remember the state of the consumer in the topic. The applications of this approach are niche but can be useful in stateless applications where every message is meant to trigger an idempotent operation, or where losing track of the progress (e.g., after a consumer restart) does not cause issues because the entire dataset is processed in a streaming or transient manner.

Here's a simple consumer without committing offsets:

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("group.id", "test-group");
4props.put("enable.auto.commit", "false"); // Autocommit is disabled
5props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
6props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
7try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
8    consumer.subscribe(Arrays.asList("topic"));
9    while (true) {
10        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
11        for (ConsumerRecord<String, String> record : records) {
12            processRecord(record); // processing without committing the offset
13        }
14    }
15}

Summary Table

StrategyAutocommit EnabledCommit MethodUse CaseProsCons
Automatic Offset CommitTrueTimer-basedBasic use cases where exact order is not crucialSimple configuration, Less code overheadLess control, Potential for duplicates
Manual Offset HandlingFalseExplicit in codeScenarios requiring precise control over message acknowledgmentMore control, Precise message handlingMore complex setup, More code to manage
No Offset CommitFalseNoneStateless operations or where duplicates do not matterSimplest setup, No overhead of managing offsetsLoss of state after failure, Potential data loss

By setting enable.auto.commit = false and choosing not to commit offsets manually, Kafka offers a unique capability to handle every message purely based on incoming sequence ignoring state persistence requirements. However, this requires careful application architecture to ensure that no critical information is lost between consumption intervals.


Course illustration
Course illustration

All Rights Reserved.