Apache Kafka
linger.ms
batch.size
Kafka settings
data streaming

Apache Kafka - linger.ms and batch.size settings

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a distributed streaming platform that enables its users to publish and subscribe to streams of records, store records in a fault-tolerant way, and process streams as they occur. Kafka is widely used in real-time streaming data architectures to provide real-time analytics. Two important configuration settings in Kafka that significantly affect the performance and efficiency of Kafka producers are linger.ms and batch.size.

Understanding linger.ms and batch.size

1. batch.size

The batch.size configuration sets the maximum amount of data that will be included in a single batch of records. The size is specified in bytes. When Kafka Producer sends data to the broker, it tries to fit as many records as possible into a batch up to this limit. The benefit of batching is two-fold: it reduces the number of requests sent to the broker, which can significantly improve I/O throughput, and it also compresses the batch of records, decreasing the size of data sent over the network.

Example: If batch.size is set to 16384 bytes (16 KB), then up to 16 KB of records can be batched together before sending them to the broker.

2. linger.ms

The linger.ms setting controls the amount of time to wait before sending a batch of records, even if the batch size hasn't been reached. This delay allows Kafka to accumulate more records into a batch, potentially filling up the batch to its maximum specified by batch.size. Setting a linger.ms greater than 0 can increase latency but improve throughput and efficiency of network utilization.

Example: Setting linger.ms to 5 means that the batch will wait for 5 milliseconds to fill additional messages before sending the batch to the brokers.

Optimizing Performance with linger.ms and batch.size

Choosing the right values for these two settings can help balance latency and throughput. For high-throughput environments where each millisecond of added latency is acceptable for better throughput, increasing batch.size and linger.ms can yield better performance. In contrast, for latency-sensitive applications, keeping linger.ms low (or at zero) ensures records are sent immediately once they arrive, sacrificing some throughput and increased operational costs.

Best Practices and Trade-offs

  • High Throughput: Increase batch.size and linger.ms to maximize each batch's payload, reducing the number of send requests.
  • Low Latency: Reduce linger.ms to improve producer latency. Keep a moderate batch.size to prevent too frequent network calls.
  • Balanced Approach: Set a moderate linger.ms and batch.size to balance between throughput and latency.

Technical Impact of Configurations

The interaction between linger.ms and batch.size greatly influences the memory usage, network utilization, and latency. A higher batch.size may also increase the memory pressure on the producer when lots of messages are being sent. Producers may need to manage backpressure and memory adequately to avoid out-of-memory errors.

Summary Table

ConfigurationDescriptionImpact
batch.sizeMaximum size in bytes of a batch of records.Influences memory use and network IO. Higher values mean fewer, larger requests.
linger.msTime to wait before sending a batch to broker, even if the batch.size is not reached.Affects message latency and throughput. A higher value increases the batch size, improving throughput but adding latency.

Conclusion

Effective tuning of linger.ms and batch.size requires understanding the trade-offs between latency, throughput, and overhead costs. Depending on the use case—whether it's a high-throughput logging system or a low-latency transaction system—it might be beneficial to adjust these parameters differently to optimize Kafka's performance according to specific needs. Testing and monitoring are critical to understanding the impacts of these settings in real-world scenarios.


Course illustration
Course illustration

All Rights Reserved.