streams api
consumer api
system design
kafka

Kafka: Consumer API vs Streams API

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Introduction

'Kafka: Consumer API vs Streams API' is a design decision about control versus abstraction. Teams that pick too early based on short examples often end up rewriting once state management, reprocessing, and scaling become real requirements. A durable choice starts with the shape of the workload and how much lifecycle control the team needs.

Core Sections

1. When Consumer API is the better fit

The Consumer API is lower-level and gives direct control over polling, batching, offset commit timing, retry behavior, and custom dead-letter routing. This is valuable when you need:

  • Non-standard retry logic
  • Fine-grained control over commit boundaries
  • Integration with external transactional systems
  • Custom partition assignment behavior

The tradeoff is engineering overhead. You must build and maintain state handling, rebalancing behavior, and operational safeguards yourself.

java
1Properties props = new Properties();
2props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
3props.put(ConsumerConfig.GROUP_ID_CONFIG, "billing");
4props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
5props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
6props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
7
8KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
9consumer.subscribe(List.of("orders"));
10
11while (true) {
12    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
13    for (ConsumerRecord<String, String> record : records) {
14        process(record.value());
15    }
16    consumer.commitSync();
17}

2. When Streams API is the better fit

Kafka Streams is a higher-level library for stream processing topologies. It handles state stores, changelog topics, and recovery. It is usually a better fit when your workload is primarily transformations, joins, windowed aggregations, and event-time logic.

Instead of writing a manual poll loop plus state code, you define processing steps and let Streams manage partitions, scaling, and state restoration.

java
1StreamsBuilder builder = new StreamsBuilder();
2KStream<String, String> input = builder.stream("orders");
3
4input.filter((k, v) -> v.contains("PAID"))
5     .mapValues(String::toLowerCase)
6     .to("orders-paid");
7
8Properties streamProps = new Properties();
9streamProps.put(StreamsConfig.APPLICATION_ID_CONFIG, "orders-pipeline");
10streamProps.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
11
12KafkaStreams streams = new KafkaStreams(builder.build(), streamProps);
13streams.start();

3. Failure semantics and operational implications

Both APIs can deliver reliable production systems, but failure modes differ in how much code you own. With Consumer API, errors in state persistence and commit timing are your responsibility. With Streams, most lifecycle behavior is standardized, but you must understand internal topics, state restoration cost, and partition planning.

A good decision test is this:

  • If most complexity is business logic transformation, choose Streams first.
  • If most complexity is custom integration and lifecycle control, choose Consumer API first.

4. Migration and hybrid patterns

You can combine both in one architecture. For example, use Streams for normalization and enrichment, then a Consumer API service for side-effect-heavy integration with external systems. This keeps transformation logic concise and isolates custom delivery concerns where they belong.

5. Testing strategy before production cutover

Before committing to one API, run the same workload through both approaches in staging for a short period. Track end-to-end latency, rebalance behavior, and recovery time after process restarts. This comparison often reveals hidden costs such as long state restoration in Streams or commit bugs in Consumer-based loops.

A practical rollout plan is to start with one partition and a shadow output topic, then increase traffic only after offset lag, error rate, and throughput remain stable across peak traffic windows.

Common Pitfalls

  • Choosing Consumer API for simple transforms and building unnecessary state-management code.
  • Choosing Streams without understanding repartitioning and internal topic growth.
  • Committing offsets too early and losing messages after process crashes.
  • Ignoring key design, then discovering skewed partitions and unstable throughput.
  • Treating local tests as sufficient proof of rebalance and recovery behavior.

Summary

  • Consumer API gives maximal control and maximal responsibility.
  • Streams API gives faster delivery for transformation-centric pipelines.
  • Offset handling, state behavior, and partition strategy should drive the decision.
  • Hybrid designs often combine the strengths of both APIs.
  • Operational testing for rebalance and recovery should be mandatory before release.

Course illustration
Course illustration

All Rights Reserved.