Kafka - Multiple consumers (only one active) on same group/topic
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a distributed event streaming platform used by many organizations to handle their real-time data feeds. Among its foundational concepts is that of consumers and consumer groups which play a vital role in the scalability and reliability of data processing. An interesting aspect of Kafka's design is the ability to have multiple consumers on the same topic, particularly within the same group, with only one consumer actively reading at any time.
Understanding Kafka Consumers and Consumer Groups
In Kafka, a consumer is an entity that reads data from topics. Conversely, a topic is a specific stream of data partitioned for scalability and distributed operation. A partition is a division within a topic where data is stored and is inherently a unit of parallelism in Kafka.
Consumer groups are critical as they allow a group of consumers to coordinate the consumption of data from a topic, ensuring that each partition is read by only one consumer from the group at a time. This means that if you have as many consumer instances as partitions, each consumer reads from one exclusive partition.
One Active Consumer in a Consumer Group
Sometimes, there's a need to ensure high availability and instantaneous failover without parallel consumption from different consumers. This is often termed as having a standby consumer which only becomes active when the primary fails. Let's explore how we can achieve this setup.
Configuration and Setup
Here's a basic outline of setting up one active consumer with others as standbys in a Kafka Consumer group:
- Configure all consumers to the same group. This ensures that they are part of the same consumer group logic.
- Utilize the same topic with the appropriate partitions.
- Set appropriate session timeouts and heartbeat intervals. These settings in Kafka's consumer configs (
session.timeout.msandheartbeat.interval.ms) allow the consumer group to quickly detect a failure in the active consumer.
When the active consumer fails (fails to send heartbeats in time due to a crash, network issues, or being overwhelmed), Kafka’s consumer group protocol triggers a rebalance. During this rebalance, one of the standby consumers is elected as the new active consumer and resumes consumption from where the last active consumer had left off.
Example Implementation
Here's a simple example in Java that demonstrates how consumers can be configured. Note that actual failover and consumer status management logic needs more sophisticated handling based on specific requirements.
This consumer effectively joins the group exampleGroup and subscribes to the topic topicName. If you run multiple instances of this code with the same configuration, only one consumer actively receives and processes messages due to the group coordination by Kafka unless the active one fails.
Summary Table of Key Points
| Aspect | Description |
| Consumer | Entity that reads data from a Kafka topic. |
| Topic | Stream of data split across partitions. |
| Partition | Unit of parallelism; each can be read by one consumer at a time. |
| Consumer Groups | Ensures that each partition is processed by only one consumer in the group. |
| Failover | Setup for standby consumers to take over if active consumer fails. |
| Key Configurations | group.id, session.timeout.ms, heartbeat.interval.ms |
Considerations and Best Practices
- Monitoring and Alerts: Implement monitoring to quickly identify and address consumer failures.
- Testing Failovers: Regularly test failover scenarios to ensure consumer election and data integrity.
- Offset Management: Manage offsets carefully to avoid data loss or duplication during failovers.
- Scalability: Consider the number of partitions relative to consumer count for optimal performance.
This setup not only ensures message processing continuity but also simplifies the operational overhead of maintaining multiple active consumers which might otherwise lead to complexity in message order handling and offset management.

