Delivery Semantics

Topics Covered

Delivery semantics — at most once, at least once, effectively once

At most once — fire and forget

At least once — retry until acknowledged

Effectively once — at least once plus idempotency

Choosing the right semantic

Broker-specific implementations

The general rule

Common pitfalls

End-to-end example: order processing pipeline

Observability for delivery semantics

Delivery semantics across the tech stack

The cost of each semantic

One-liner summary

Every distributed system eventually faces this question: what happens when a message between two services gets lost, duplicated, or delayed?

The answer is not "make sure it doesn't happen." Networks are unreliable by definition — packets drop, connections reset, servers crash mid-processing. Even within a single data center, network failures happen multiple times per day at scale. Delivery semantics describe what guarantee a system provides in the face of these inevitable failures — and what trade-offs it accepts to provide that guarantee.

Why does this matter? Because the choice of delivery semantic shapes your entire system architecture.

An at-most-once system needs no dedup store, no idempotency keys, no transaction coordination — it is simple and fast. An effectively-once system needs all of those things, adding infrastructure, complexity, and latency. Choosing the right semantic for each message flow is one of the most impactful architectural decisions you make.

There are three delivery guarantees, and each trades something for something else:

  • At most once — deliver 0 or 1 times. Fast, simple, but messages can be permanently lost.
  • At least once — deliver 1 or more times. No loss, but duplicates are possible.
  • Effectively once — the final effect is as if each message was processed exactly once. Built on top of at-least-once by adding idempotency or deduplication.

Notice that "exactly once" is not on this list. True exactly-once delivery is impossible in a distributed system — you cannot distinguish between "the consumer processed the message but the ack was lost" and "the consumer never received the message." Both look the same from the producer's perspective: no ack arrived. The best you can do is effectively once — deliver at least once, then make duplicates harmless.

Some vendors market "exactly-once" delivery (Kafka Streams, Pulsar), but read the fine print. These systems achieve effectively-once within their own ecosystem by combining idempotent producers, transactional consumers, and internal dedup. The moment a message crosses a system boundary (Kafka to a REST API, Kafka to a database), the exactly-once guarantee breaks down and the consuming application must provide its own idempotency layer.

Delivery Semantics Comparison

At most once — fire and forget

The producer sends a message and moves on. No acknowledgment, no retry. If the network drops the message or the consumer crashes before processing, the message is gone forever. There is no mechanism to detect or recover the loss.

Why use it? Speed and simplicity. Sensor telemetry streaming 10,000 readings per second can afford to lose a few data points — retrying them would create backpressure that slows the entire pipeline. High-volume logging is another example: losing 0.01% of log lines is acceptable when the alternative is blocking the application thread while waiting for an acknowledgment.

The key question for at-most-once is: "If this message is lost, what is the business impact?" If the answer is "negligible" or "self-correcting" (the next message carries the same information), at-most-once is the right choice.

How brokers implement it: Kafka achieves at-most-once when consumers commit the offset before processing. If the consumer crashes after committing but before processing, that message is skipped forever. RabbitMQ achieves it with auto-acknowledgment — the broker considers the message delivered the instant it sends it, regardless of whether the consumer actually handled it.

At-most-once outside messaging. UDP-based protocols are inherently at-most-once — fire a datagram, no acknowledgment. DNS queries over UDP are a common example: the client sends a lookup, and if the response does not arrive, it retries to a different server rather than the same one, accepting that the first query was lost. Even HTTP services can behave as at-most-once when the client sends a request and ignores failures. A metrics collection endpoint that accepts POST /metrics and returns 202 Accepted is effectively at-most-once from the client's perspective if the client does not retry on timeout.

The spectrum of loss tolerance. Not all data within the same system has the same loss tolerance. A ride-sharing app might use at-most-once for driver location updates (stale by 2 seconds is fine, lost updates are quickly replaced by the next ping) but at-least-once for trip completion events (losing a trip means the rider is never charged). Designing per-flow delivery semantics — rather than one global setting — is a sign of mature system architecture.

When loss is actually dangerous: At-most-once is tempting for its simplicity, but be careful about hidden consequences. A logging pipeline that drops 0.01% of log lines is fine — until the dropped log line is the one that explains why your payment service charged a customer twice. For audit trails, compliance logging, or debugging critical paths, at-least-once is the safer default even if throughput is slightly lower.

Real-world at-most-once systems:

  • StatsD/UDP metrics — applications fire UDP datagrams to a StatsD server. No ack, no retry. At high volume (50K metrics/sec), the loss rate is typically under 0.1%, which is invisible in time-series dashboards that aggregate over 10-second windows.
  • DNS over UDP — the resolver sends a query and sets a timeout. If no response arrives, it queries a different nameserver. The original query is abandoned (at-most-once to that server). The resolver's retry to a different server provides system-level reliability despite per-server at-most-once semantics.
  • Fire-and-forget notifications — "Your package is out for delivery" push notifications. If one is lost, the user checks the tracking page manually. The cost of loss (minor inconvenience) is much lower than the cost of guaranteed delivery infrastructure.
  • Real-time game state updates — multiplayer games send position updates 30-60 times per second. A lost frame is replaced by the next one 16ms later. Retrying stale position data is worse than skipping it.

The common thread: in every at-most-once scenario, the value of the individual message is low relative to the cost of guaranteed delivery. When a message is worth cents (a sensor reading, a log line, a game frame), the infrastructure to guarantee delivery costs dollars per message-flow. The math does not work.

Key Insight

The reason 'exactly once' is in quotes throughout distributed systems literature is that true exactly-once delivery is impossible over an unreliable network (this follows from the Two Generals Problem). What systems actually provide is 'effectively once' — at-least-once delivery combined with application-level idempotency to make duplicates harmless.


At least once — retry until acknowledged

The producer sends a message, waits for an acknowledgment from the broker or consumer, and retries if the ack does not arrive within a timeout. The message is guaranteed to be delivered at least once, but may be delivered multiple times because the producer cannot distinguish "message lost" from "ack lost."

The duplicate problem: Imagine a payment service processes a charge and sends the ack, but the ack is lost in transit. The producer retries, and the payment service processes the charge again. The customer is billed twice. This is the fundamental trade-off: you eliminated message loss at the cost of possible duplicates.

How duplicates happen in practice. The most common cause is not network packet loss — it is timeouts during slow processing. The consumer receives a message, starts a 5-second database transaction, but the broker's ack timeout is 3 seconds. The broker assumes the message was lost and re-delivers it to another consumer (or the same consumer on reconnect). Now two consumers are processing the same message simultaneously. This is why ack timeouts must be set longer than the maximum processing time, with margin for garbage collection pauses and network jitter.

Visibility timeout in SQS. Amazon SQS uses a different model called visibility timeout. When a consumer receives a message, the message becomes invisible to other consumers for a configurable duration (default 30 seconds). If the consumer does not delete the message within that window, it becomes visible again and another consumer can pick it up. This is at-least-once delivery built into the queue — the consumer must explicitly delete the message after successful processing. If the processing takes longer than the visibility timeout, another consumer will receive the same message, creating a duplicate.

The idempotency imperative. In any at-least-once system, the question is not "will duplicates happen?" but "when duplicates happen, will they cause damage?" Design every consumer with the assumption that it will process the same message at least twice over its lifetime. If the consumer's operation is naturally idempotent, you are done. If not, you need a dedup mechanism. There is no third option.

Ordering and at-least-once. Retries can cause messages to arrive out of order. If Message A times out and is retried, Message B (sent after A) may arrive before the retry of A. For systems where ordering matters (event sourcing, financial ledgers), this requires either: (1) a single-partition guarantee (Kafka partitions preserve order within a partition), (2) sequence numbers that the consumer uses to detect and reorder out-of-sequence messages, or (3) idempotent operations where order does not matter (SET value = X is order-independent).

How brokers implement it: Kafka's default mode — consumers process the message, then commit the offset. If the consumer crashes after processing but before committing, Kafka re-delivers the message on restart. RabbitMQ with manual acknowledgments works the same way — the broker re-queues unacknowledged messages.

The retry window matters. How long should the producer wait before retrying? Too short and you create duplicates from slow-but-successful deliveries. Too long and the system stalls. Most systems use exponential backoff with jitter: wait 100ms, then 200ms, then 400ms, each with a random offset to avoid thundering herds when a broker comes back online and all producers retry simultaneously.

Retry budgets and circuit breakers. Unlimited retries can overload a recovering service. Production systems use retry budgets (maximum N retries per message, maximum M total retries per second across all messages) and circuit breakers (stop retrying entirely after K consecutive failures, wait for a cooldown period, then try one probe request). If the probe succeeds, resume retries. If it fails, extend the cooldown. This prevents a failing consumer from generating exponentially growing retry traffic that makes recovery harder.

Poison messages. Some messages consistently fail processing (malformed data, schema mismatch, application bug). Without a maximum retry count, these messages cycle through the retry loop forever, consuming resources. After N failed attempts, move the message to a dead letter queue (DLQ) for manual investigation. Kafka does not have built-in DLQ support — implement it in the consumer. RabbitMQ supports DLQ natively via dead letter exchanges (DLX).

At-least-once in HTTP (idempotency keys). The same pattern applies outside message brokers. When Stripe processes a payment, the client includes an Idempotency-Key header. If the request times out and the client retries with the same key, Stripe returns the original response instead of processing a second charge. This is at-least-once delivery (the client retries) combined with server-side idempotency — effectively-once semantics over HTTP.

 
1# HTTP example: Stripe-style idempotency
2POST /api/charges
3Idempotency-Key: order_abc_123
4Content-Type: application/json
5
6{"amount": 2500, "currency": "usd", "customer": "cus_xyz"}
7# Server checks: has order_abc_123 been processed?
8# If yes: return cached response (no new charge)
9# If no: process charge, cache response keyed by order_abc_123

Effectively once — at least once plus idempotency

This is not a delivery-layer guarantee — it is an application-level pattern built on top of at-least-once delivery. The infrastructure retries messages freely, and the consumer ensures that processing the same message twice produces the same outcome as processing it once. The term "effectively once" (rather than "exactly once") emphasizes this distinction: the message may be delivered multiple times, but the effect is as if it was processed once.

Idempotency Deduplication Flow

Two implementation patterns:

1. Idempotent operations. Design the consumer so the operation is naturally idempotent. SET balance = 500 is idempotent — running it twice leaves the balance at 500. SET balance = balance + 100 is not — running it twice adds 200. When possible, use absolute values instead of deltas.

Examples of naturally idempotent operations:

  • INSERT ... ON CONFLICT DO NOTHING — inserting the same record twice is harmless
  • UPDATE users SET status = 'active' WHERE id = 123 — setting a value to a constant
  • DELETE FROM sessions WHERE id = 'abc' — deleting an already-deleted row is a no-op
  • PUT /api/users/123 — replacing the entire resource with the same payload

Examples of non-idempotent operations (require dedup protection):

  • UPDATE accounts SET balance = balance - 50 — each replay deducts again
  • INSERT INTO events (...) without a unique constraint — each replay creates a new row
  • Sending an email — each replay sends another copy
  • Calling an external payment API without an idempotency key

2. Deduplication store. Assign each message a unique ID (order_id, event_id, UUID). Before processing, the consumer checks a dedup store (Redis SET, database unique constraint). If the ID exists, skip processing. If not, process and record the ID with a TTL.

Choosing between the two patterns: Use idempotent operations when the domain allows it — upserts (INSERT ... ON CONFLICT UPDATE), absolute state transitions (SET status = 'shipped'), and conditional writes (UPDATE WHERE version = N). Use a dedup store when the operation is inherently non-idempotent — sending an email, calling an external API, or publishing an event downstream. The dedup store is more general but adds a dependency (Redis or a database) and requires careful TTL sizing.

Dedup store design considerations:

  • TTL sizing: The TTL must be longer than the maximum retry window. If Kafka retains messages for 7 days and a consumer might replay from any point in that window, the dedup TTL should be at least 7 days. Shorter TTLs save memory but create a window where old retries slip through.
  • Redis vs. database: Redis SETs are fast (sub-millisecond lookups) but volatile — a Redis restart loses the dedup state and duplicates can slip through until the SET is rebuilt. A database unique constraint (INSERT INTO processed_events (event_id) ON CONFLICT DO NOTHING) is durable but adds a few milliseconds per check. For most systems, Redis with periodic persistence (RDB snapshots) is the right balance.
  • Compound dedup keys: Sometimes a message ID alone is not enough. A dedup key of (event_id, consumer_id) ensures that two different consumers can independently process the same event without interfering with each other's dedup stores.
  • Atomicity of process-and-record: The dedup check, the business logic, and the dedup record insert should ideally be atomic. If the consumer processes the message and crashes before recording the ID in the dedup store, the next delivery will process it again (the dedup store does not know about the first processing). Wrapping the business logic and dedup store write in a database transaction solves this. If the dedup store is Redis (not transactional with your database), accept a small window of possible duplicates during consumer crashes.
 
1# Pseudocode: dedup-guarded consumer
2def handle(message):
3    if redis.sismember("processed", message.id):
4        return ACK  # already processed, skip
5    process(message)
6    redis.sadd("processed", message.id)
7    redis.expire("processed", 86400)  # TTL 24h
8    return ACK
Interview Tip

In interviews, default to at-least-once with idempotent consumers. This is the standard production pattern for payment processing, order management, and any system where losing a message is unacceptable. Interviewers expect you to explain the dedup mechanism (usually a Redis SET or database unique constraint on the message ID) rather than hand-waving 'exactly once delivery.'


Choosing the right semantic

Use caseSemanticWhy
Sensor telemetry, loggingAt most onceLoss tolerable, throughput matters
Order processing, notificationsAt least onceLoss unacceptable, consumers are idempotent
Payment charges, inventory decrementsEffectively onceDuplicates cause real damage (double-charge)
Email/SMS notificationsAt least once + dedupUsers notice duplicate messages
Analytics event trackingAt least onceDuplicates inflate counts slightly, usually acceptable
Database replication (WAL streaming)Effectively onceReplaying the same WAL entry must be idempotent
Cache invalidationAt least onceInvalidating an already-invalid cache entry is harmless

Broker-specific implementations

Understanding how major message brokers implement each semantic is essential for making the right infrastructure choice.

Kafka has the most mature effectively-once support:

  • Idempotent producers (enable.idempotence=true): the producer assigns a sequence number to each message. The broker detects and discards duplicates from producer retries. This costs almost nothing in performance and should be enabled by default.
  • Transactions: wrap a read-process-write cycle in a transaction. The consumer reads from an input topic, processes, writes to an output topic, and commits the consumer offset — all atomically. If any step fails, everything rolls back. This is Kafka's version of effectively-once for stream processing pipelines (Kafka Streams and Flink use this under the hood).
  • Consumer groups: the offset commit is the boundary. If you commit before processing, you get at-most-once. If you commit after processing, you get at-least-once. Transactions give you effectively-once by making the offset commit and the output write atomic.
  • Consumer rebalancing: when a consumer joins or leaves a group, Kafka reassigns partitions. The new consumer resumes from the last committed offset, potentially reprocessing messages that the old consumer handled but did not commit. This is another source of at-least-once duplicates, distinct from crash recovery.

RabbitMQ does not have built-in effectively-once:

  • Publisher confirms: the broker acknowledges that it received and persisted the message. This prevents message loss between producer and broker (but not between broker and consumer).
  • Manual acknowledgments: the consumer explicitly acks after processing. If the consumer crashes before acking, the broker re-delivers to another consumer.
  • Effectively-once requires a dedup store on the consumer side. RabbitMQ's message_id header can serve as the dedup key if the publisher sets it consistently.

Redis Streams provides at-least-once with consumer groups:

  • Consumers call XREADGROUP to receive messages and XACK to acknowledge. Unacknowledged messages can be claimed by other consumers via XCLAIM after a timeout.
  • No built-in effectively-once — use application-level idempotency.

The general rule

Start with at-least-once and make your consumers idempotent. Only upgrade to full effectively-once (transactions, dedup stores) when the cost of a duplicate exceeds the cost of the extra infrastructure. For most systems, at-least-once with idempotent consumers is the sweet spot: simple, reliable, and cheap.


Common pitfalls

1. Confusing delivery and processing. A message can be delivered exactly once by the broker but still processed multiple times if the consumer has internal retries. Delivery semantics describe the broker-to-consumer guarantee. Processing semantics are the application's responsibility.

2. Assuming idempotency is free. Making an operation idempotent often requires schema changes (adding a processed_event_id column), infrastructure additions (a Redis dedup store), or application logic changes (converting deltas to absolute values). Budget time for this during design.

3. Forgetting about consumer restarts. When a consumer restarts, it resumes from its last committed offset. If the consumer processed messages 1-100 but only committed offset 90, messages 91-100 will be redelivered. This is at-least-once by design, not a bug — but it means your consumer must be prepared for redelivery of any message since the last committed offset.

4. Using at-most-once for critical data by accident. Auto-ack in RabbitMQ and commit-before-process in Kafka look like the "simple" option. But if a consumer crashes after receiving a message, that message is gone forever. For order processing, payment, or any business-critical flow, this silent data loss is unacceptable. Always use manual acknowledgments for important data.

5. Not testing the failure paths. It is easy to verify that messages flow correctly when nothing fails. The hard part is testing what happens when the consumer crashes mid-processing, when the broker restarts, when the network partitions between producer and broker. Chaos engineering tools (Toxiproxy, Chaos Monkey) can inject these failures in staging environments. Without testing failure paths, you only know your system works under ideal conditions — which is when you least need it to work.


End-to-end example: order processing pipeline

To see how delivery semantics interact in a real system, trace an order through a typical e-commerce pipeline:

  1. User places order (HTTP POST to Order Service). The Order Service writes the order to its database and publishes an order.created event to Kafka. The Kafka producer uses acks=all (waits for all in-sync replicas to confirm) and enable.idempotence=true (producer-side dedup). This ensures the event reaches Kafka durably.
  2. Payment Service consumes order.created from Kafka. It charges the customer's credit card via Stripe, using the order_id as the Stripe idempotency key. If Kafka re-delivers the event (consumer crash before offset commit), the second charge attempt hits Stripe's idempotency layer and returns the original charge result. The payment is effectively-once.
  3. Inventory Service consumes the same order.created event. It decrements stock using UPDATE products SET stock = stock - 1 WHERE id = ? AND stock > 0. This is NOT idempotent — processing the event twice decrements stock twice. The service uses a Redis dedup store keyed by order_id to prevent double-decrement. Effectively-once via dedup.
  4. Notification Service consumes order.created and sends a confirmation email. Sending email is inherently non-idempotent (you cannot un-send an email). A dedup store prevents duplicate sends, but the TTL must cover the entire retry window. If the dedup store loses the key (Redis restart), the customer might get a duplicate email — annoying but not catastrophic. The team accepts at-least-once for notifications.
  5. Analytics Service consumes order.created and increments counters. Duplicate events inflate metrics slightly. The team uses at-least-once with no dedup because perfect accuracy is not required for dashboards — a 0.1% overcount in daily order volume is invisible in trend charts.
  6. Shipping Service consumes a payment.succeeded event from the Payment Service. It creates a shipping label via a third-party API (FedEx, UPS). Creating a label is non-idempotent and costs money — each API call creates a new label. The service uses a dedup store keyed by order_id AND checks the shipping API's own idempotency features (most carriers support idempotency keys). Double defense: dedup store prevents the API call, and the carrier's idempotency prevents a duplicate label even if the dedup store fails.

Each consumer in the pipeline independently chooses the delivery semantic that matches its domain risk. This per-consumer design — where each service owns its own idempotency guarantees rather than relying on a single global guarantee — is the standard pattern in production event-driven architectures.

Notice that the same order.created event flows to six different consumers, each with a different delivery semantic. The Kafka topic does not dictate the semantic — each consumer's processing logic does. This is the key architectural insight: delivery semantics are a consumer-side concern, not a broker-side concern.

The producer's responsibility is to get the event into Kafka durably (acks=all, idempotent producer). From there, each consumer independently decides how to handle duplicates based on its own domain requirements. This separation of concerns is what makes event-driven architectures composable — you can add a new consumer to the topic without modifying existing consumers or the producer.


Observability for delivery semantics

You cannot verify your delivery guarantees without metrics:

  • Consumer lag — the gap between the latest offset in a Kafka partition and the last committed offset by each consumer group. Rising lag means the consumer cannot keep up, increasing the risk of at-least-once redelivery if the consumer crashes with a large uncommitted window.
  • Dedup hit rate — percentage of messages where the dedup store returned "already processed." A consistently zero hit rate means either your system is perfectly healthy (no retries) or your dedup store is broken. A suddenly high hit rate indicates a retry storm — investigate why messages are being redelivered.
  • Processing time per message — if this exceeds the broker's ack timeout, the broker will redeliver to another consumer while the first is still processing. This is the most common cause of unexpected duplicates.
  • Dead letter queue depth — messages that failed processing after all retries. These are the messages that your delivery semantics could not handle automatically. Someone (usually an on-call engineer or a reconciliation job) must investigate and reprocess them manually.

Delivery semantics across the tech stack

Delivery semantics are not limited to message brokers. The same concepts appear throughout distributed systems:

HTTP APIs. A client calling POST /api/orders with a timeout of 5 seconds. If the server takes 6 seconds to respond, the client assumes failure and retries. But the first request succeeded — the server just responded late. Without an idempotency key, the second request creates a duplicate order. This is at-least-once delivery over HTTP.

Database replication. PostgreSQL streaming replication sends WAL entries from leader to followers. If the follower crashes and restarts, it resumes from its last flushed position. Some WAL entries may be replayed. This is at-least-once — but WAL replay is designed to be idempotent (replaying the same WAL entry twice produces the same result), so the effective semantic is effectively-once.

Event sourcing. In an event-sourced system, the event store is append-only and the source of truth. Projections (read models) consume events and build materialized views. If a projection crashes and restarts, it replays events from its last processed position. Duplicates are harmless because projections are rebuilt from scratch — the same events produce the same view. This is effectively-once by construction.

gRPC and RPC frameworks. gRPC supports client-side retries with configurable retry policies. When a call times out, the framework retries automatically. The server must be idempotent for retry-safe methods. gRPC distinguishes between idempotent methods (safe to retry) and non-idempotent methods (must not retry) via service configuration. The retry policy specifies which status codes trigger retries (UNAVAILABLE, DEADLINE_EXCEEDED), the maximum number of retries, and the backoff interval.

Webhook delivery. Third-party integrations (Stripe, GitHub, Twilio) deliver events via HTTP webhooks. These are at-least-once by design — the provider retries on timeout or 5xx responses. Webhook consumers must be idempotent because the same event can arrive multiple times over hours or days as the provider retries with exponential backoff. Most webhook APIs include an event ID for deduplication.

Microservice choreography. In an event-driven microservice architecture, every service-to-service communication is a message delivery with a delivery semantic. The Saga pattern coordinates distributed transactions by chaining events across services — each step must be idempotent because compensating transactions (rollbacks) may replay events. A well-designed saga is effectively-once at the business logic level, built on at-least-once delivery at the infrastructure level.


The cost of each semantic

SemanticInfrastructure costApplication complexityLatency overheadData risk
At most onceMinimal — no ack, no retry, no dedupNoneNoneMessage loss
At least onceModerate — retry logic, ack trackingModerate — must handle duplicates or be idempotentRetry latency on failureDuplicate processing
Effectively onceHigh — dedup store, transactions, or idempotency keysHigh — dedup key generation, TTL management, testingDedup check latency per messageDedup window edge cases

The progression from at-most-once to effectively-once is a progression of increasing infrastructure cost and application complexity. Each step up adds a new moving part. The engineering judgment is deciding where each message flow sits on this spectrum — not defaulting everything to the "safest" option, which would over-engineer simple flows and under-resource critical ones.

A practical guideline for a typical SaaS application:

  • At most once for telemetry, metrics, and non-critical notifications (roughly 60% of message flows by volume)
  • At least once with idempotent consumers for core business events — order created, user signed up, subscription changed (roughly 35% of flows)
  • Effectively once with dedup stores for financial transactions, inventory changes, and external API calls (roughly 5% of flows, but the most critical 5%)

This distribution means most of your system is simple (no dedup infrastructure), and the complexity is concentrated where it matters most. Document which semantic each message flow uses, and include the rationale in your architecture decision records.


One-liner summary

At most once trades reliability for speed — messages may be lost.

At least once trades simplicity for reliability — messages may duplicate.

Effectively once trades infrastructure cost for correctness — application-level idempotency makes duplicates harmless.

Every message flow in your system should make an explicit, documented choice among these three. The worst outcome is an implicit choice — a system that happens to be at-most-once because nobody configured acknowledgments, discovered only when critical data goes missing in production.