Kafka
Data Compression
Producer Configuration
Topic Settings
Apache Kafka Configuration

Kafka set compression type at producer vs topic

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a popular, distributed event streaming platform capable of handling trillions of events a day. Its design provides high throughput, built-in partitioning, replication, and inherent fault-tolerance, which makes it a good fit for large-scale message and event-driven applications. In such systems, efficient data handling and transmission are crucial, and one way to achieve efficiency is through the use of data compression. Compression in Kafka can be controlled either at the producer level or at the topic level, each having its own advantages and implications.

Compression in Kafka

Compression is a technique used to reduce the size of the messages to be stored and transmitted over the network. In Kafka, compression happens at the message set level (a batch of messages) rather than at the individual message level. This approach is more efficient because compressing a batch of messages together typically provides a better compression ratio compared to compressing each message individually.

Kafka supports multiple compression codecs:

  • None: No compression.
  • GZIP: A balance of compression ratio and compression time.
  • LZ4: A codec featuring a high speed but lower compression ratio than GZIP.
  • Snappy: Optimized for speed and reasonable compression ratio, making it ideal for data that requires fast compression and decompression.
  • ZSTD (Zstandard): Offers higher compression ratios with a good balance between speed and compression efficiency.

Producer-Level Compression

Compression can be specified at the producer level in the Kafka producer configuration. This is done by setting the compression.type property in the producer's configuration file or through the API.

How Producer-Level Compression Works

When compression is set at the producer level, every batch of messages produced by this producer will be compressed using the specified codec before being sent to the Kafka brokers. This not only reduces the amount of data transferred over the network but also decreases the storage required on the Kafka brokers.

Example of Producer-Level Compression

Here is an example configuration snippet for a Kafka producer using Java:

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
4props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
5props.put("compression.type", "gzip");  // Setting compression to GZIP
6
7KafkaProducer<String, String> producer = new KafkaProducer<>(props);

Topic-Level Compression

Alternatively, compression can be managed at the topic level. When compression is configured at the topic level, it overrides the compression type set at the producer level.

How Topic-Level Compression Works

The topic-level compression setting is used primarily when messages are stored by the Kafka brokers. Regardless of the producer’s compression configuration, the brokers can recompress message batches (if necessary) based on the topic’s compression setting before storing them.

Example of Topic-Level Compression

This configuration is typically set during topic creation using the Kafka command line tools or administration client. Here’s how you can create a topic with ZSTD compression:

bash
kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic example-topic --config compression.type=zstd

Comparing Producer vs Topic-Level Compression

Here is a summary table highlighting the differences and implications of setting compression at the producer level versus the topic level:

AspectProducer-Level CompressionTopic-Level Compression
ScopeAffects only messages sent by the specific producerApplies to all messages stored in the topic, regardless of the producer
Network EfficiencyHigher, as messages are compressed before transmissionDepends on producer settings, but generally lower compared to when producers compress messages
Storage EfficiencyGenerally lower as compression is handled before writing to the topicHighest, as Kafka can recompress messages to optimize storage
Flexibility in CompressionHigh (different producers can use different compression types)Low (one setting per topic, affecting all messages)
Consumer Decompression CostTypically lower, as messages are decompressed onceHigher if messages need recompression after producer decompression

Conclusion

Choosing where to apply compression in Kafka — at the producer or at the topic level — depends largely on the specific requirements of your architecture, such as network bandwidth limits, storage costs, and producer/consumer configurations. Properly managing compression can lead to significant performance and cost benefits in a Kafka environment.


Course illustration
Course illustration

All Rights Reserved.