Exactly-Once Stream Processing: How Checkpoints and Idempotency Work Together
April 30, 2026
Exactly-once is the most overloaded term in stream processing. People hear it and assume the framework gives them a magic guarantee. It does not. What it gives you is two ingredients that, used together, produce the effect of exactly-once. Used alone, neither one is enough.
The first ingredient is the checkpoint.
A stream processor reads from a source, transforms data, and writes to a sink. While it runs, it accumulates state: window aggregates, join buffers, deduplication tables, the offset it has consumed up to. Periodically, the framework snapshots that state to durable storage. Flink calls it a checkpoint. Kafka Streams calls it a changelog plus committed offset. The shape varies. The idea is the same: write down everything you would need to resume from this exact moment.
When the processor crashes, it restores state from the most recent checkpoint and rewinds the source to the offset stored in that checkpoint. It reprocesses the records between the checkpoint and the moment of failure. The internal state ends up exactly where it would have been with no crash.
That handles state. It does not handle output.
If your sink is a Postgres table and your transformation already wrote row X to it before the crash, replaying from the checkpoint will write row X again. Two rows. At-least-once, not exactly-once.
That is where the second ingredient comes in: idempotent writes.
An idempotent sink either deduplicates internally or accepts a key from the processor that lets it ignore duplicates. Concrete patterns:
- Kafka producers send records with a producer ID and sequence number. The broker rejects duplicates from the same producer.
- A SQL sink uses an
INSERT ... ON CONFLICT DO NOTHINGkeyed on the source offset. - A blob store uses the checkpoint ID as the filename, so a retry overwrites the same object instead of creating a new one.
Now the failure modes are easy to reason about.
Checkpointing without idempotent sinks gives you at-least-once. State is correct on recovery, but downstream consumers see duplicate writes.
Idempotent sinks without checkpointing gives you at-most-once at best, and corrupted state at worst, because internal aggregates are not restored across restart.
Neither gives you nothing. Both, together, give you the effect of exactly-once: the source data is processed once as far as anyone downstream can tell, even when the system has crashed and restarted ten times along the way.
When a vendor says "exactly-once," ask which half they are providing and which half you are still on the hook for. Usually it is the second half.
Exactly-once is not a single dial. It is checkpointing for replayable input plus idempotent sinks for replayable output. Skip either half and you fall back to at-least-once or at-most-once.
Originally posted on LinkedIn. View original.