Kafka & Streaming Databases Distributed Systems

Kafka Exactly-Once Semantics: The Three Pieces You Actually Need

May 5, 2026

Most people who turn on Kafka exactly-once never look at what they actually enabled. The setting flips three different mechanisms at once, and if you understand the pipeline as a stream processor you can see why all three are needed.

Start with the producer. With enable.idempotence=true, the producer asks the broker for a producer ID and starts tagging every record with that ID plus a per-partition sequence number. When the producer retries because of a timeout, the broker can see that sequence number 47 already landed and silently discards the duplicate. This solves exactly one problem: a network blip turning one logical send into two physical writes on a single partition. It does not span partitions. It does not span topics. It is purely a per-producer, per-partition guard.

Now consider a stream processor. It reads from an input topic, transforms the records, writes to an output topic, and commits its consumer offsets back to Kafka. That is three independent state changes. If the processor crashes after writing output but before committing the offset, the next run consumes the same input again and emits duplicate output. Idempotent producers do not help here, because the second run gets a new producer ID and sequence space.

That is where transactions come in. The processor declares a transactional.id, opens a transaction, writes to the output partitions, and calls sendOffsetsToTransaction to include the consumer offsets in the same atomic unit. The transaction coordinator writes COMMIT markers to every output partition and to the internal __consumer_offsets topic in one shot. Either all the output records and the offset advance become visible together, or none of them do.

The third piece is the consumer on the downstream side. Set isolation.level=read_committed or none of this matters. A read_uncommitted consumer happily reads records from aborted transactions and from open transactions that will later abort. The markers exist, the consumer ignores them, and the guarantees evaporate.

The production failure mode worth knowing: a processor with a slow downstream call that exceeds transaction.timeout.ms. The coordinator fences the producer, aborts the transaction, and the processor restarts with a bumped epoch. The records were written to the log but the ABORT marker hides them from read_committed consumers. Fine. Except your downstream service already saw the side effect from the slow call. Kafka EOS only covers Kafka. Anything you wrote to Postgres or called over HTTP inside the transaction is on you, and a retry will replay it.

The mental model: the idempotent producer is local, transactions are atomic across Kafka partitions and offset commits, and read_committed is the contract on the read side. Strip any of the three and the pipeline has a duplicate path. Most "exactly-once" outages I have seen were not Kafka bugs. They were a downstream consumer still on the default read_uncommitted, or an external sink the transaction never covered.

Key takeaway

Kafka exactly-once is three independent features stacked: idempotent writes per partition, transactions that span partitions and offset commits, and `read_committed` consumers that honor the markers. Enable any two and you still have a duplicate path.

Originally posted on LinkedIn. View original.