Kafka
Avro
Message Production
Message Consumption
Non-Confluent Components

Producing and Consuming Avro messages from Kafka without Confluent components

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is a widely-used distributed streaming platform which specializes in handling real-time data feeds. Apache Avro, on the other hand, is a serialization framework that can be used to encode the data structure in a way that is compact, fast and interoperable with other languages like Java, Python, and C. This dual integration can be efficiently utilized without relying on Confluent components, which is beneficial for those who prefer an open-source or are restrained by Confluent's licensing model.

Understanding Apache Avro

Avro uses a JSON format to define the structure of data (schema) which is used to serialize and deserialize the data. A typical Avro data schema defines a type for each record field, each type can be a simple type like a string or int or a complex type like an array or map.

How Avro Works with Kafka

When integrating Avro with Kafka data streams, a key consideration is how Avro data is serialized and deserialized efficiently for real-time processing. The usual setup involves:

  1. Defining the Avro schema.
  2. Serializing the producer messages to bytes using Avro.
  3. Sending these bytes into Kafka topics.
  4. Deserializing the messages from Kafka using the same Avro schema.

Producing Avro Messages

Kafka Producer Configuration

To serialize data, the producer application must be configured to use an Avro serializer. Here's a simplified Java example using Apache Kafka and Apache Avro libraries directly:

java
1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
4props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
5
6KafkaProducer<String, GenericRecord> producer = new KafkaProducer<>(props);
7
8Schema schema = new Schema.Parser().parse(new File("path/to/your-schema.avsc"));
9GenericRecord avroRecord = new GenericData.Record(schema);
10avroRecord.put("name", "John Doe");
11avroRecord.put("age", 25);
12
13ProducerRecord<String, GenericRecord> record = new ProducerRecord<>("your_topic", avroRecord);
14producer.send(record);
15producer.close();

In this example, you define the schema using the .avsc file then serialize the record into Avro format. However, notice that the serializer mentioned uses Confluent's serializers, which we want to avoid.

Instead, you can use the Apache Avro's native serializers, AvroSerializer, by possibly configuring a custom implementation or using available non-Confluent libraries.

Consuming Avro Messages

Consuming messages in Avro format from Kafka is a reverse process of producing them:

java
1Properties properties = new Properties();
2properties.put("bootstrap.servers", "localhost:9092");
3properties.put("group.id", "test-group");
4properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
5properties.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
6
7KafkaConsumer<String, GenericRecord> consumer = new KafkaConsumer<>(properties);
8consumer.subscribe(Collections.singletonList("your_topic"));
9
10while (true) {
11    ConsumerRecords<String, GenericRecord> records = consumer.poll(Duration.ofMillis(100));
12    for (ConsumerRecord<String, GenericRecord> record : records) {
13        GenericRecord received = record.value();
14        System.out.println("Received: " + received);
15    }
16}

Note that similar to the producer example, you should replace KafkaAvroDeserializer with the appropriate non-Confluent deserializer.

Use Cases

Avro is highly effective for:

  • Systems requiring high throughput and bandwidth efficiency.
  • Environments with evolving schemas where backward compatibility is crucial.

Challenges

  • Lack of a common schema store: Without Confluent's Schema Registry, managing schema evolution and ensuring compatibility may be complex and might need custom solutions.
  • Increased complexity in serialization and deserialization: Developers need to handle serializers and deserializers on their own.

Summary Table

AspectDescription
UsageCompact serialization in Kafka Streams
ProsEfficient data encoding, interoperability
ConsRequires explicit schema management
ToolsApache Avro, Kafka native libraries
ChallengesManaging schema without Confluent Schema Registry

Overall, while Confluent offers additional components that facilitate the use of Avro with Kafka, using native Apache tools is entirely feasible. This approach benefits those seeking an open-source solution or needing customization beyond what Confluent provides.


Course illustration
Course illustration

All Rights Reserved.