Kafka message codec - compress and decompress
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka, a popular distributed streaming platform, is designed for handling real-time data feeds with high-throughput and low-latency characteristics. Effective data processing in Kafka often requires the use of compression techniques to optimize network usage and increase performance. This article covers the ways Kafka supports message compression and decompression, technical details, and examples of implementation.
Why Compress Messages in Kafka?
Compression in Kafka has a dual benefit: it reduces the size of the data being transmitted through the network between producers, brokers, and consumers which saves bandwidth and decreases storage on Kafka brokers. Moreover, Kafka's data structure inherently supports batch compression which can compress a set of messages together, further optimizing the data size.
Supported Compression Codecs
Kafka currently supports multiple compression codecs:
- None (no compression)
- GZIP
- Snappy
- LZ4
- ZSTD (since Kafka 2.1)
Compression Implementation
When Kafka producers send messages to brokers, they have the option to apply compression at the message batch level. Here is an outline of how data is typically compressed and decompressed:
- Compression at Producer:
- Producers collect messages that are destined for the same partition into a message set, known in newer versions of Kafka as a "record batch."
- The entire record batch is compressed using the specified codec, reducing the size of the data sent to the Kafka brokers.
- Storage at Broker:
- The Kafka broker stores the compressed message batch as is, without decompressing it. This helps in reducing I/O operations and saves storage.
- Decompression at Consumer:
- Consumers receive the compressed batch of messages and decompress it upon receipt.
Example: Using Compression in Kafka Producer
Let's see how to configure a Kafka producer with compression using Java:
In the code above, the COMPRESSION_TYPE_CONFIG property is set to "gzip". You can replace "gzip" with "snappy", "lz4", or "zstd" depending on your requirements.
Performance Implications
Using compression can significantly increase Kafka’s throughput and reduce the data footprint at the cost of increased CPU usage for compression and decompression processes. The choice of compression codec can impact both the compression ratio and the computational requirements.
Compression Codec Comparison
| Codec | Compression Ratio | Speed | CPU Usage | Compatibility |
| None | 1:1 | Fastest | Very Low | All |
| GZIP | High | Slow | High | All |
| Snappy | Medium | Fast | Low | All |
| LZ4 | Medium-High | Fast | Medium | All |
| ZSTD | Very High | Medium | Medium-High | Kafka 2.1+ |
Conclusion
Effective use of compression in Kafka can lead to better utilization of network and storage resources, although it requires careful consideration of the trade-offs between speed, CPU usage, and compression effectiveness. By analyzing the specific needs and constraints of an application, users can choose the appropriate compression codec and optimize their Kafka deployment for both performance and cost.

