Kafka Consumer
max.poll.records
Configuration Optimization
Data Streaming
Distributed Systems

Impact of reducing max.poll.records in Kafka Consumer configuration

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Reducing max.poll.records in a Kafka consumer decreases the number of records returned per poll() call, which makes the consumer more responsive to rebalances and commits at the cost of lower throughput. The default value is 500. Lowering it is the primary lever for preventing poll() timeout violations when per-record processing is slow, but it must be tuned alongside max.poll.interval.ms, fetch.min.bytes, and fetch.max.wait.ms to avoid unintended side effects.

What max.poll.records Controls

Each time your consumer calls poll(), the Kafka client returns at most max.poll.records records from the internal prefetch buffer. This does not control how much data is fetched from the broker per network request (that is fetch.max.bytes and max.partition.fetch.bytes). It only limits how many records are handed to your application code in a single poll() invocation.

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "broker1:9092,broker2:9092");
3props.put("group.id", "order-processor");
4props.put("max.poll.records", "50");  // Reduced from default 500
5props.put("max.poll.interval.ms", "300000");
6props.put("enable.auto.commit", "false");
7props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
8props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
9
10KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
11consumer.subscribe(Collections.singletonList("orders"));
12
13while (true) {
14    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
15    for (ConsumerRecord<String, String> record : records) {
16        processOrder(record);  // Slow: 50-200ms per record
17    }
18    consumer.commitSync();
19}

In this example, with max.poll.records=50 and processing at 200ms per record, the worst-case processing time per poll() is 10 seconds. With the default of 500, it would be 100 seconds, which would exceed max.poll.interval.ms (default 300 seconds in older versions, 5 minutes in newer ones) and trigger a rebalance.

Effects of Reducing max.poll.records

Faster Poll Loop Cycles

Smaller batches mean each poll() loop completes faster. This keeps the consumer "alive" from the coordinator's perspective, because the consumer sends heartbeats and calls poll() more frequently.

text
1Default (500 records, 100ms each):
2  poll() -> process 500 records -> 50 seconds -> commit -> poll()
3  
4Reduced (50 records, 100ms each):
5  poll() -> process 50 records -> 5 seconds -> commit -> poll()

More Frequent Commits

If you commit offsets after each poll() cycle (the common pattern), reducing max.poll.records means commits happen more frequently. This reduces the amount of data that needs to be reprocessed after a crash.

java
1// With max.poll.records=50, this commits every 50 records
2// With max.poll.records=500, this commits every 500 records
3while (true) {
4    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
5    for (ConsumerRecord<String, String> record : records) {
6        process(record);
7    }
8    consumer.commitSync();  // Commit frequency tied to batch size
9}

Lower Throughput

Each poll() call has fixed overhead: network round-trip, deserialization setup, and coordinator communication. With smaller batches, this fixed cost is amortized over fewer records, reducing overall throughput.

Better Rebalance Responsiveness

During a consumer group rebalance, the consumer needs to finish its current poll() processing before it can participate. With a smaller batch, the consumer finishes faster and responds to rebalance requests more quickly, reducing the "stop the world" pause for the entire consumer group.

Trade-off Analysis

MetricHigher max.poll.recordsLower max.poll.records
ThroughputHigher (better amortization)Lower (more overhead per record)
Latency per batchHigherLower
Rebalance responsivenessSlowerFaster
Commit frequencyLess frequentMore frequent
Reprocessing after crashMore records replayedFewer records replayed
Risk of poll timeoutHigherLower
Network overhead ratioLowerHigher

max.poll.records does not exist in isolation. These parameters interact with it:

java
1// How long the coordinator waits for a poll() before kicking the consumer
2props.put("max.poll.interval.ms", "300000");  // 5 minutes
3
4// Minimum data the broker accumulates before responding to a fetch
5props.put("fetch.min.bytes", "1");  // Default: 1 byte (respond immediately)
6
7// Maximum time the broker waits to accumulate fetch.min.bytes
8props.put("fetch.max.wait.ms", "500");  // Default: 500ms
9
10// Maximum data per partition per fetch request
11props.put("max.partition.fetch.bytes", "1048576");  // Default: 1MB
12
13// Maximum data per fetch request across all partitions
14props.put("fetch.max.bytes", "52428800");  // Default: 50MB

A common mistake is reducing max.poll.records without adjusting fetch.min.bytes. If the broker waits to accumulate 1MB of data before responding, but you only process 50 records per poll, the prefetch buffer fills up and the excess data sits in memory waiting for subsequent poll() calls. This increases memory pressure without improving responsiveness.

Configuration Interaction Diagram

text
1Broker                          Consumer JVM
2  |                               |
3  |  fetch request                |
4  |<------------------------------|  max.partition.fetch.bytes
5  |                               |  fetch.max.bytes
6  |  fetch response (N records)   |
7  |------------------------------>|
8  |                               |  [prefetch buffer]
9  |                               |     |
10  |                               |     | poll()
11  |                               |     | max.poll.records = M
12  |                               |     |
13  |                               |  [application gets min(N, M) records]
14  |                               |
15  |                               |  process + commit
16  |                               |  (must complete within max.poll.interval.ms)

Sizing Guide

Use this formula to estimate the right value:

text
1max.poll.records = max.poll.interval.ms / worst_case_processing_time_per_record
2
3Example:
4  max.poll.interval.ms = 300,000 ms (5 minutes)
5  worst_case_processing_time = 200 ms per record
6  Safety factor = 2x
7
8  max.poll.records = 300,000 / 200 / 2 = 750
9
10  Since default is 500, the default is fine in this case.
11  But if processing takes 1000ms per record:
12  max.poll.records = 300,000 / 1000 / 2 = 150

The safety factor accounts for GC pauses, network latency spikes, and downstream service slowdowns. A 2x factor is a reasonable starting point.

Spring Kafka Configuration

In Spring Boot applications, set max.poll.records through application properties:

yaml
1spring:
2  kafka:
3    consumer:
4      max-poll-records: 100
5      properties:
6        max.poll.interval.ms: 300000
7        fetch.min.bytes: 1

Or programmatically through the consumer factory:

java
1@Bean
2public ConsumerFactory<String, String> consumerFactory() {
3    Map<String, Object> props = new HashMap<>();
4    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
5    props.put(ConsumerConfig.GROUP_ID_CONFIG, "order-processor");
6    props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 100);
7    props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 300000);
8    return new DefaultKafkaConsumerFactory<>(props);
9}

Monitoring After the Change

After reducing max.poll.records, monitor these metrics to verify the change had the intended effect:

  • consumer_lag: Should remain stable or decrease. If lag increases, throughput dropped too much.
  • records-consumed-rate: Total records per second. Expect a small decrease.
  • poll-rate: Number of poll() calls per second. Should increase proportionally.
  • commit-rate: Should increase, confirming more frequent commits.
  • rebalance count: Should decrease if poll timeouts were causing unnecessary rebalances.
bash
# Check consumer lag with kafka-consumer-groups
kafka-consumer-groups.sh --bootstrap-server broker1:9092 \
  --group order-processor --describe

Common Pitfalls

  • Reducing max.poll.records without measuring per-record processing time first. The right value depends on how long your processing takes, not on an arbitrary number.
  • Forgetting that max.poll.records limits records per poll(), not records per fetch from the broker. Data is still prefetched in larger batches and buffered in memory.
  • Setting max.poll.records very low (e.g., 1) for "safety." This destroys throughput because every record incurs the full overhead of a poll() cycle.
  • Not adjusting fetch.min.bytes alongside max.poll.records. A large fetch.min.bytes causes unnecessary buffering when only a small number of records are consumed per poll.
  • Ignoring max.poll.interval.ms. If the real problem is that processing takes too long, reducing max.poll.records is a symptom fix. Consider async processing, a larger max.poll.interval.ms, or moving heavy work to a separate thread pool.
  • Assuming the change is free. More frequent poll() calls mean more coordinator communication, more commit requests, and higher broker-side overhead.

Summary

  • max.poll.records controls how many records your application receives per poll() call, not how much data is fetched from the broker.
  • Reducing it makes the consumer more responsive to rebalances and commits more frequently, at the cost of lower throughput.
  • Size it based on the formula: max.poll.interval.ms / worst_case_processing_time / safety_factor.
  • Always tune it alongside max.poll.interval.ms, fetch.min.bytes, and fetch.max.wait.ms to avoid unintended side effects.
  • Monitor consumer lag, poll rate, and rebalance count after making the change to verify it had the desired effect.

Course illustration
Course illustration

All Rights Reserved.