Apache Kafka
Kafka Producer
Acknowledgement
Message Processing
Distributed Systems

Kafka - Producer Acknowledgement

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka, a distributed streaming platform, enables systems and applications to publish and subscribe to streams of records, store streams of records, and process them as they occur. Kafka producers are nodes or processes that publish (write) data to Kafka topics. The reliability and durability of data within Apache Kafka are paramount, and to ensure these criteria are met, Kafka incorporates a feature known as Producer Acknowledgements (acks).

Understanding Producer Acknowledgements

Producer acknowledgements are a mechanism through which producers confirm that their messages have been successfully written to the Kafka log and are hence acknowledged by the broker. When a producer sends a message, it can specify the level of acknowledgment desired from the Kafka brokers.

Configuration of Producer Acknowledgements

Kafka allows the configuration of producer acknowledgments using the acks setting, which can take one of the following values:

  • 0: The producer will not wait for any acknowledgment from the broker. With this setting, the message will be considered as sent once it leaves the producer. This setting provides the lowest latency but the weakest durability guarantee because there is a high risk of message loss.
  • 1 (Default): The producer gets an acknowledgment after the leader replica has received the data. This setting offers a good balance of latency and durability.
  • all or -1: The producer will wait for all in-sync replicas to acknowledge the message. This setting provides the highest level of durability but also increases latency.

How Producer Acknowledgements Work

When a message is sent by a producer, the Kafka broker that acts as the leader for the partition will receive the record. The acknowledgment behavior then depends on the acks configuration:

  1. acks=0:
    • The broker doesn’t send any acknowledgment to the producer. The producer assumes success once data is sent.
  2. acks=1:
    • The leader writes the record to its local log but responds to the producer before the data is committed to all followers. If the leader crashes before followers replicate the data, the message may be lost.
  3. acks=all:
    • The leader waits until all in-sync replicas have written the data to their log. This ensures that the data is not lost as long as at least one in-sync replica remains alive.

Technical Considerations and Trade-offs

Using different acks settings involves trade-offs between performance and data durability:

  • Setting acks=0 can lead to data loss but provides the lowest latency and highest throughput.
  • With acks=1, there is a moderate chance of data loss if the leader fails before data is replicated, but it provides lower latency compared to acks=all.
  • acks=all gives the highest data safety but at a cost of higher latency and lower throughput.

Table: Summary of Acknowledgment Settings

Ack ModeLatencyThroughputData Loss RiskSuitable Use Cases
0LowestHighestHighLogging less critical data
1LowHighModerateDefault setting, balances safety and performance
allHighestLowerLowCritical data where loss is unacceptable

Code Example

An example of configuring a Kafka producer in Java to use acks=all:

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("acks", "all");
4props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
5props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
6
7Producer<String, String> producer = new KafkaProducer<>(props);
8producer.send(new ProducerRecord<String, String>("topicName", "key", "value"));
9producer.close();

Conclusion

Choosing the right producer acknowledgment level depends directly on the specific requirements of your application concerning data durability and performance. For most applications, the default acks=1 is sufficient. However, for critical data where loss is not acceptable, acks=all should be used, keeping in mind the associated performance trade-offs.


Course illustration
Course illustration

All Rights Reserved.