Apache Kafka
Consumer Groups
Topic Management
Kafka Consumers
Distributed Systems

Kafka - Multiple consumers (only one active) on same group/topic

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a distributed event streaming platform used by many organizations to handle their real-time data feeds. Among its foundational concepts is that of consumers and consumer groups which play a vital role in the scalability and reliability of data processing. An interesting aspect of Kafka's design is the ability to have multiple consumers on the same topic, particularly within the same group, with only one consumer actively reading at any time.

Understanding Kafka Consumers and Consumer Groups

In Kafka, a consumer is an entity that reads data from topics. Conversely, a topic is a specific stream of data partitioned for scalability and distributed operation. A partition is a division within a topic where data is stored and is inherently a unit of parallelism in Kafka.

Consumer groups are critical as they allow a group of consumers to coordinate the consumption of data from a topic, ensuring that each partition is read by only one consumer from the group at a time. This means that if you have as many consumer instances as partitions, each consumer reads from one exclusive partition.

One Active Consumer in a Consumer Group

Sometimes, there's a need to ensure high availability and instantaneous failover without parallel consumption from different consumers. This is often termed as having a standby consumer which only becomes active when the primary fails. Let's explore how we can achieve this setup.

Configuration and Setup

Here's a basic outline of setting up one active consumer with others as standbys in a Kafka Consumer group:

  1. Configure all consumers to the same group. This ensures that they are part of the same consumer group logic.
  2. Utilize the same topic with the appropriate partitions.
  3. Set appropriate session timeouts and heartbeat intervals. These settings in Kafka's consumer configs (session.timeout.ms and heartbeat.interval.ms) allow the consumer group to quickly detect a failure in the active consumer.

When the active consumer fails (fails to send heartbeats in time due to a crash, network issues, or being overwhelmed), Kafka’s consumer group protocol triggers a rebalance. During this rebalance, one of the standby consumers is elected as the new active consumer and resumes consumption from where the last active consumer had left off.

Example Implementation

Here's a simple example in Java that demonstrates how consumers can be configured. Note that actual failover and consumer status management logic needs more sophisticated handling based on specific requirements.

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("group.id", "exampleGroup");
4props.put("enable.auto.commit", "false");
5props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
6props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
7
8KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
9consumer.subscribe(Arrays.asList("topicName"));
10
11while (true) {
12    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
13    for (ConsumerRecord<String, String> record : records) {
14        processRecord(record);
15    }
16
17    // Commit offsets after processing
18    consumer.commitSync();
19}

This consumer effectively joins the group exampleGroup and subscribes to the topic topicName. If you run multiple instances of this code with the same configuration, only one consumer actively receives and processes messages due to the group coordination by Kafka unless the active one fails.

Summary Table of Key Points

AspectDescription
ConsumerEntity that reads data from a Kafka topic.
TopicStream of data split across partitions.
PartitionUnit of parallelism; each can be read by one consumer at a time.
Consumer GroupsEnsures that each partition is processed by only one consumer in the group.
FailoverSetup for standby consumers to take over if active consumer fails.
Key Configurationsgroup.id, session.timeout.ms, heartbeat.interval.ms

Considerations and Best Practices

  • Monitoring and Alerts: Implement monitoring to quickly identify and address consumer failures.
  • Testing Failovers: Regularly test failover scenarios to ensure consumer election and data integrity.
  • Offset Management: Manage offsets carefully to avoid data loss or duplication during failovers.
  • Scalability: Consider the number of partitions relative to consumer count for optimal performance.

This setup not only ensures message processing continuity but also simplifies the operational overhead of maintaining multiple active consumers which might otherwise lead to complexity in message order handling and offset management.


Course illustration
Course illustration

All Rights Reserved.