Kafka set compression type at producer vs topic
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a popular, distributed event streaming platform capable of handling trillions of events a day. Its design provides high throughput, built-in partitioning, replication, and inherent fault-tolerance, which makes it a good fit for large-scale message and event-driven applications. In such systems, efficient data handling and transmission are crucial, and one way to achieve efficiency is through the use of data compression. Compression in Kafka can be controlled either at the producer level or at the topic level, each having its own advantages and implications.
Compression in Kafka
Compression is a technique used to reduce the size of the messages to be stored and transmitted over the network. In Kafka, compression happens at the message set level (a batch of messages) rather than at the individual message level. This approach is more efficient because compressing a batch of messages together typically provides a better compression ratio compared to compressing each message individually.
Kafka supports multiple compression codecs:
- None: No compression.
- GZIP: A balance of compression ratio and compression time.
- LZ4: A codec featuring a high speed but lower compression ratio than GZIP.
- Snappy: Optimized for speed and reasonable compression ratio, making it ideal for data that requires fast compression and decompression.
- ZSTD (Zstandard): Offers higher compression ratios with a good balance between speed and compression efficiency.
Producer-Level Compression
Compression can be specified at the producer level in the Kafka producer configuration. This is done by setting the compression.type property in the producer's configuration file or through the API.
How Producer-Level Compression Works
When compression is set at the producer level, every batch of messages produced by this producer will be compressed using the specified codec before being sent to the Kafka brokers. This not only reduces the amount of data transferred over the network but also decreases the storage required on the Kafka brokers.
Example of Producer-Level Compression
Here is an example configuration snippet for a Kafka producer using Java:
Topic-Level Compression
Alternatively, compression can be managed at the topic level. When compression is configured at the topic level, it overrides the compression type set at the producer level.
How Topic-Level Compression Works
The topic-level compression setting is used primarily when messages are stored by the Kafka brokers. Regardless of the producer’s compression configuration, the brokers can recompress message batches (if necessary) based on the topic’s compression setting before storing them.
Example of Topic-Level Compression
This configuration is typically set during topic creation using the Kafka command line tools or administration client. Here’s how you can create a topic with ZSTD compression:
Comparing Producer vs Topic-Level Compression
Here is a summary table highlighting the differences and implications of setting compression at the producer level versus the topic level:
| Aspect | Producer-Level Compression | Topic-Level Compression |
| Scope | Affects only messages sent by the specific producer | Applies to all messages stored in the topic, regardless of the producer |
| Network Efficiency | Higher, as messages are compressed before transmission | Depends on producer settings, but generally lower compared to when producers compress messages |
| Storage Efficiency | Generally lower as compression is handled before writing to the topic | Highest, as Kafka can recompress messages to optimize storage |
| Flexibility in Compression | High (different producers can use different compression types) | Low (one setting per topic, affecting all messages) |
| Consumer Decompression Cost | Typically lower, as messages are decompressed once | Higher if messages need recompression after producer decompression |
Conclusion
Choosing where to apply compression in Kafka — at the producer or at the topic level — depends largely on the specific requirements of your architecture, such as network bandwidth limits, storage costs, and producer/consumer configurations. Properly managing compression can lead to significant performance and cost benefits in a Kafka environment.

