Distributed Messaging: Queues vs. Logs for Beginners

Course

System Design Fundamentals

Distributed Messaging: Queues vs. Logs for Beginners

Topics Covered

Queues vs. Logs: what's the real difference?

The Role of Partitions in Scaling

Consumer Groups: Parallelism and Pub/Sub Combined

Offset Tracking, Acknowledgments, and Delivery Guarantees

Kafka offset tracking

RabbitMQ acknowledgments

Kafka vs. RabbitMQ vs. Redis Streams: a comparison

When to choose each

Queues vs. Logs: what's the real difference?

Both message queues and event logs move data from producers to consumers, but they disagree on a fundamental question: what happens to a message after a consumer reads it? A message queue (RabbitMQ, SQS) delivers each message to exactly one consumer, then deletes it after acknowledgment. Think of it as a to-do list: once a task is checked off, it is gone. Multiple consumers on the same queue compete for messages. If two services need the same message, the broker must copy it into separate queues.

An event log (Kafka, Redpanda) appends messages to a durable, ordered log and never deletes them on read. Consumers track their own position (offset) and can rewind to replay old events. Multiple consumer groups read the same log independently without affecting each other.

This distinction drives every downstream decision: scaling model, replay capability, failure recovery, and coupling between services.

Queue model strengths: Simple work distribution (job queues, task dispatching), built-in load balancing across consumers, message-level routing with exchange patterns (topic, fanout, headers).

Log model strengths: Event replay for debugging and reprocessing, multiple independent consumers from one stream, natural audit trail, time-travel to any point in the retained window.

Key Insight

The queue model treats messages as tasks to be completed. The log model treats messages as facts to be recorded. This philosophical difference is why you use RabbitMQ for job dispatch (send email, resize image) and Kafka for event streaming (user clicked, order placed). Choosing the wrong model creates friction at every layer of the architecture.

The Role of Partitions in Scaling

A single Kafka partition is a strictly ordered, append-only log served by one broker. It is also the unit of parallelism: you cannot have more active consumers in a group than partitions. Understanding partitions is understanding Kafka's scaling model.

Why not just one big log? A single partition on one broker tops out at roughly 50-100 MB/s write throughput (disk-bound). If your stream produces 500 MB/s, you need at least 5-10 partitions spread across multiple brokers to handle the write load. On the read side, each partition can only be consumed by one consumer per group, so more partitions also mean more parallel consumers.

Partition key determines ordering. When a producer sends a message with a key (e.g., user_id), Kafka hashes the key to determine the target partition. All messages with the same key land on the same partition, guaranteeing ordering within that key. Messages across different partitions have no ordering guarantee.

How many partitions? Start with max(expected_throughput / per_partition_throughput, expected_consumer_count). A common starting point is 6-12 partitions for a moderate topic. Over-partitioning wastes broker metadata and increases rebalance time. Under-partitioning caps your consumer parallelism.

Interview Tip

In interviews, always mention partition key choice when discussing Kafka. A bad partition key (e.g., country_code with 80% of traffic from one country) creates a hot partition that bottlenecks the entire consumer group. The ideal key distributes load evenly while grouping related events that need ordering (e.g., user_id, order_id).

Consumer Groups: Parallelism and Pub/Sub Combined

Kafka's consumer group is the mechanism that gives you both competing-consumer parallelism (like a queue) and independent multi-subscriber fan-out (like pub/sub) from the same topic.

Within a group: partitions are divided among consumers. Each message is processed by exactly one consumer in the group. This is the queue-like behavior.

Across groups: each group gets its own independent copy of every message. Group A and Group B both read the full stream without interfering with each other. This is the pub/sub behavior.

Rebalancing happens when consumers join or leave a group. The group coordinator redistributes partition ownership. During a rebalance, consumption pauses briefly. Frequent rebalances (from flapping consumers, long GC pauses, or aggressive session timeouts) degrade throughput.

Static group membership (setting group.instance.id) prevents rebalancing when a consumer restarts within the session timeout. The consumer rejoins and reclaims its previous partitions without triggering a full rebalance. This is critical for stateful consumers that maintain local caches or aggregations.

Common Pitfall

A consumer group rebalance pauses all consumption for the entire group, not just the affected consumer. If your group has 20 consumers and one crashes, all 20 stop consuming during the rebalance. Keep session.timeout.ms and heartbeat.interval.ms tuned to minimize unnecessary rebalances while detecting genuine failures quickly.

Offset Tracking, Acknowledgments, and Delivery Guarantees

The way a consumer tracks progress determines the delivery guarantee. In Kafka, this means offset management. In RabbitMQ, this means acknowledgment mode. Getting this wrong is how you lose messages or process them twice.

Kafka offset tracking

Each consumer group maintains a committed offset per partition, the position of the last successfully processed message. On restart, the consumer resumes from the committed offset.

Auto-commit (default): Kafka periodically commits the current offset (every 5 seconds by default). If the consumer crashes between auto-commits, it restarts from the last committed offset and reprocesses messages. This is at-least-once delivery.

Manual commit: The application explicitly commits after processing. This gives precise control but requires careful placement:

1# Commit BEFORE processing -> at-most-once (message may be lost)
2commit(offset)
3process(message)
4
5# Commit AFTER processing -> at-least-once (message may be duplicated)
6process(message)
7commit(offset)

Consumer lag is the gap between the latest produced offset and the consumer's committed offset. Monitoring lag tells you if a consumer is falling behind. High lag means messages are waiting longer to be processed, which can cause staleness in downstream systems.

RabbitMQ acknowledgments

RabbitMQ uses per-message acknowledgments instead of offsets:

Auto-ack: Broker considers the message delivered as soon as it sends it. If the consumer crashes, the message is lost. This is at-most-once.
Manual ack: Consumer explicitly acknowledges after processing. Unacknowledged messages are redelivered to another consumer. This is at-least-once.
nack + requeue: Consumer explicitly rejects a message and tells the broker to re-queue it for another attempt.

The critical difference: Kafka consumers can replay old messages by resetting offsets. RabbitMQ consumers cannot. Once acknowledged, the message is deleted from the queue.

Kafka vs. RabbitMQ vs. Redis Streams: a comparison

Each system optimizes for different access patterns. The right choice depends on your workload, not on which system is "best."

Dimension	Kafka	RabbitMQ	Redis Streams
Model	Append-only log	Message queue	Append-only stream
Ordering	Per-partition	Per-queue	Per-stream
Retention	Configurable (hours to forever)	Until acknowledged	Configurable (MAXLEN or MINID)
Replay	Yes (offset reset)	No (messages deleted after ack)	Yes (read from any ID)
Consumer groups	Built-in (partition-based)	Competing consumers (queue-based)	Built-in (XREADGROUP)
Throughput	Very high (100K+ msg/sec per partition)	Moderate (10-50K msg/sec per queue)	High (limited by single Redis node)
Scaling	Add brokers + partitions	Add queues + consumers	Cluster mode or client-side sharding
Best for	Event streaming, audit logs, high-volume pipelines	Task dispatch, request-reply, complex routing	Lightweight streaming when Redis is already in the stack

When to choose each

Kafka when you need event replay, high throughput, multiple independent consumers from one stream, or long retention. Common uses: clickstream, change data capture, event sourcing, log aggregation.

RabbitMQ when you need sophisticated routing (topic exchanges, headers, priority queues), request-reply patterns, or per-message TTL and dead-letter routing. Common uses: task queues, RPC over messages, workflow orchestration.

Redis Streams when you already run Redis and need lightweight pub/sub with consumer groups and persistence. Throughput is capped by the single-threaded Redis model (or cluster sharding), so it suits moderate-volume streams, not Kafka-scale pipelines. Common uses: activity feeds, notification streams, lightweight event buses in microservices that already use Redis for caching.

Interview Tip

In interviews, do not say 'Kafka is better than RabbitMQ.' Instead, match the system to the access pattern. If the interviewer describes task dispatch with complex routing, RabbitMQ is the right answer. If they describe event streaming with replay, Kafka is the right answer. Showing that you understand the trade-offs is more impressive than picking a favorite.

Course

System Design Fundamentals

Networking & APIs

Storage & Data Modeling

Partitioning, Replication & Consistency

Caching & Edge

Messaging & Streaming

Reliability & Operability

Security & Privacy

Common Interview Scenarios