System Design Fundamentals
Networking & APIs
Storage & Data Modeling
Partitioning, Replication & Consistency
Caching & Edge
Reliability & Operability
Security & Privacy
Distributed Messaging: Queues vs. Logs for Beginners
Both message queues and event logs move data from producers to consumers, but they disagree on a fundamental question: what happens to a message after a consumer reads it? A message queue (RabbitMQ, SQS) delivers each message to exactly one consumer, then deletes it after acknowledgment. Think of it as a to-do list: once a task is checked off, it is gone. Multiple consumers on the same queue compete for messages. If two services need the same message, the broker must copy it into separate queues.
An event log (Kafka, Redpanda) appends messages to a durable, ordered log and never deletes them on read. Consumers track their own position (offset) and can rewind to replay old events. Multiple consumer groups read the same log independently without affecting each other.

This distinction drives every downstream decision: scaling model, replay capability, failure recovery, and coupling between services.
Queue model strengths: Simple work distribution (job queues, task dispatching), built-in load balancing across consumers, message-level routing with exchange patterns (topic, fanout, headers).
Log model strengths: Event replay for debugging and reprocessing, multiple independent consumers from one stream, natural audit trail, time-travel to any point in the retained window.
The queue model treats messages as tasks to be completed. The log model treats messages as facts to be recorded. This philosophical difference is why you use RabbitMQ for job dispatch (send email, resize image) and Kafka for event streaming (user clicked, order placed). Choosing the wrong model creates friction at every layer of the architecture.
A single Kafka partition is a strictly ordered, append-only log served by one broker. It is also the unit of parallelism: you cannot have more active consumers in a group than partitions. Understanding partitions is understanding Kafka's scaling model.
Why not just one big log? A single partition on one broker tops out at roughly 50-100 MB/s write throughput (disk-bound). If your stream produces 500 MB/s, you need at least 5-10 partitions spread across multiple brokers to handle the write load. On the read side, each partition can only be consumed by one consumer per group, so more partitions also mean more parallel consumers.

Partition key determines ordering. When a producer sends a message with a key (e.g., user_id), Kafka hashes the key to determine the target partition. All messages with the same key land on the same partition, guaranteeing ordering within that key. Messages across different partitions have no ordering guarantee.
How many partitions? Start with max(expected_throughput / per_partition_throughput, expected_consumer_count). A common starting point is 6-12 partitions for a moderate topic. Over-partitioning wastes broker metadata and increases rebalance time. Under-partitioning caps your consumer parallelism.
In interviews, always mention partition key choice when discussing Kafka. A bad partition key (e.g., country_code with 80% of traffic from one country) creates a hot partition that bottlenecks the entire consumer group. The ideal key distributes load evenly while grouping related events that need ordering (e.g., user_id, order_id).
Kafka's consumer group is the mechanism that gives you both competing-consumer parallelism (like a queue) and independent multi-subscriber fan-out (like pub/sub) from the same topic.
Within a group: partitions are divided among consumers. Each message is processed by exactly one consumer in the group. This is the queue-like behavior.
Across groups: each group gets its own independent copy of every message. Group A and Group B both read the full stream without interfering with each other. This is the pub/sub behavior.

Rebalancing happens when consumers join or leave a group. The group coordinator redistributes partition ownership. During a rebalance, consumption pauses briefly. Frequent rebalances (from flapping consumers, long GC pauses, or aggressive session timeouts) degrade throughput.
Static group membership (setting group.instance.id) prevents rebalancing when a consumer restarts within the session timeout. The consumer rejoins and reclaims its previous partitions without triggering a full rebalance. This is critical for stateful consumers that maintain local caches or aggregations.
A consumer group rebalance pauses all consumption for the entire group, not just the affected consumer. If your group has 20 consumers and one crashes, all 20 stop consuming during the rebalance. Keep session.timeout.ms and heartbeat.interval.ms tuned to minimize unnecessary rebalances while detecting genuine failures quickly.
The way a consumer tracks progress determines the delivery guarantee. In Kafka, this means offset management. In RabbitMQ, this means acknowledgment mode. Getting this wrong is how you lose messages or process them twice.
Kafka offset tracking
Each consumer group maintains a committed offset per partition, the position of the last successfully processed message. On restart, the consumer resumes from the committed offset.
Auto-commit (default): Kafka periodically commits the current offset (every 5 seconds by default). If the consumer crashes between auto-commits, it restarts from the last committed offset and reprocesses messages. This is at-least-once delivery.
Manual commit: The application explicitly commits after processing. This gives precise control but requires careful placement:
Consumer lag is the gap between the latest produced offset and the consumer's committed offset. Monitoring lag tells you if a consumer is falling behind. High lag means messages are waiting longer to be processed, which can cause staleness in downstream systems.
RabbitMQ acknowledgments
RabbitMQ uses per-message acknowledgments instead of offsets:
- Auto-ack: Broker considers the message delivered as soon as it sends it. If the consumer crashes, the message is lost. This is at-most-once.
- Manual ack: Consumer explicitly acknowledges after processing. Unacknowledged messages are redelivered to another consumer. This is at-least-once.
- nack + requeue: Consumer explicitly rejects a message and tells the broker to re-queue it for another attempt.
The critical difference: Kafka consumers can replay old messages by resetting offsets. RabbitMQ consumers cannot. Once acknowledged, the message is deleted from the queue.
Each system optimizes for different access patterns. The right choice depends on your workload, not on which system is "best."
| Dimension | Kafka | RabbitMQ | Redis Streams |
| Model | Append-only log | Message queue | Append-only stream |
| Ordering | Per-partition | Per-queue | Per-stream |
| Retention | Configurable (hours to forever) | Until acknowledged | Configurable (MAXLEN or MINID) |
| Replay | Yes (offset reset) | No (messages deleted after ack) | Yes (read from any ID) |
| Consumer groups | Built-in (partition-based) | Competing consumers (queue-based) | Built-in (XREADGROUP) |
| Throughput | Very high (100K+ msg/sec per partition) | Moderate (10-50K msg/sec per queue) | High (limited by single Redis node) |
| Scaling | Add brokers + partitions | Add queues + consumers | Cluster mode or client-side sharding |
| Best for | Event streaming, audit logs, high-volume pipelines | Task dispatch, request-reply, complex routing | Lightweight streaming when Redis is already in the stack |
When to choose each
Kafka when you need event replay, high throughput, multiple independent consumers from one stream, or long retention. Common uses: clickstream, change data capture, event sourcing, log aggregation.
RabbitMQ when you need sophisticated routing (topic exchanges, headers, priority queues), request-reply patterns, or per-message TTL and dead-letter routing. Common uses: task queues, RPC over messages, workflow orchestration.
Redis Streams when you already run Redis and need lightweight pub/sub with consumer groups and persistence. Throughput is capped by the single-threaded Redis model (or cluster sharding), so it suits moderate-volume streams, not Kafka-scale pipelines. Common uses: activity feeds, notification streams, lightweight event buses in microservices that already use Redis for caching.
In interviews, do not say 'Kafka is better than RabbitMQ.' Instead, match the system to the access pattern. If the interviewer describes task dispatch with complex routing, RabbitMQ is the right answer. If they describe event streaming with replay, Kafka is the right answer. Showing that you understand the trade-offs is more impressive than picking a favorite.