Difference between session.timeout.ms and max.poll.interval.ms for Kafka >= 0.10.1
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Introduction
session.timeout.ms and max.poll.interval.ms both affect consumer liveness in Kafka, but they protect against different failure modes. The short version is that session.timeout.ms is about staying alive in the group through heartbeats, while max.poll.interval.ms is about making progress by calling poll() often enough.
session.timeout.ms Watches Heartbeats
Kafka consumers in a group must keep heartbeating to the group coordinator. If the broker does not receive heartbeats within session.timeout.ms, it assumes the consumer is gone and triggers a rebalance.
That means session.timeout.ms is mainly about membership liveness:
- process crashed
- machine disconnected
- network stalled long enough to miss heartbeats
- consumer thread stopped heartbeating
It is typically paired with heartbeat.interval.ms, which is usually set lower so the consumer sends multiple heartbeats during one session window.
A smaller session timeout detects dead consumers faster, but it also makes the group more sensitive to transient pauses or network jitter.
max.poll.interval.ms Watches Application Progress
Starting with newer consumer behavior introduced around Kafka 0.10.1, heartbeats and record processing became meaningfully separated. A consumer might still appear alive from a heartbeat perspective while your application is actually stuck processing a batch and not calling poll() again.
That is what max.poll.interval.ms is for. It places an upper bound on the time between poll() calls.
If process(record) takes so long that the next poll() does not happen before max.poll.interval.ms expires, Kafka treats the consumer as stuck and begins removing it from the group. In other words, the consumer may still be alive as a process but dead in terms of useful progress.
Why They Are Not Interchangeable
These two settings solve different problems:
- '
session.timeout.msanswers "is this consumer still heartbeating?"' - '
max.poll.interval.msanswers "is this consumer still returning topoll()and participating normally?"'
That difference matters when processing is slow. Raising only session.timeout.ms does not fix a consumer that spends too long between polls. Likewise, raising only max.poll.interval.ms does not help if the process or network stops heartbeats entirely.
A typical configuration for heavy processing might look like this:
If processing still exceeds five minutes, you should consider reducing max.poll.records, moving long work off the poll thread, or decoupling ingestion from downstream processing rather than simply increasing timeouts forever.
Practical Tuning Advice
If rebalances happen because the consumer crashes or loses connectivity, look at session.timeout.ms. If rebalances happen during long-running processing, look at max.poll.interval.ms.
In many systems, the better fix is architectural:
- poll smaller batches
- hand work to another thread pool
- store records durably and process them outside the poll loop
Timeout tuning can help, but it should not hide a design that blocks the poll loop for too long.
Common Pitfalls
- Assuming both settings control the same timeout.
- Increasing
session.timeout.mswhen the real problem is slow processing between polls. - Increasing
max.poll.interval.mswhile still leaving huge batches on the poll thread. - Forgetting to tune
max.poll.recordsalongside processing time. - Treating rebalances as only a broker-side issue instead of a consumer design issue.
Summary
- '
session.timeout.msis the heartbeat-based group membership timeout.' - '
max.poll.interval.msis the maximum allowed delay betweenpoll()calls.' - One protects against dead consumers, the other against stuck consumers.
- Slow processing is usually a
max.poll.interval.msproblem, not asession.timeout.msproblem. - Good tuning often includes smaller batches or moving work off the poll thread.

