Producing and Consuming Avro messages from Kafka without Confluent components

Kafka

Avro

Message Production

Message Consumption

Non-Confluent Components

Producing and Consuming Avro messages from Kafka without Confluent components

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Apache Kafka is a widely-used distributed streaming platform which specializes in handling real-time data feeds. Apache Avro, on the other hand, is a serialization framework that can be used to encode the data structure in a way that is compact, fast and interoperable with other languages like Java, Python, and C. This dual integration can be efficiently utilized without relying on Confluent components, which is beneficial for those who prefer an open-source or are restrained by Confluent's licensing model.

Understanding Apache Avro

Avro uses a JSON format to define the structure of data (schema) which is used to serialize and deserialize the data. A typical Avro data schema defines a type for each record field, each type can be a simple type like a string or int or a complex type like an array or map.

How Avro Works with Kafka

When integrating Avro with Kafka data streams, a key consideration is how Avro data is serialized and deserialized efficiently for real-time processing. The usual setup involves:

Defining the Avro schema.
Serializing the producer messages to bytes using Avro.
Sending these bytes into Kafka topics.
Deserializing the messages from Kafka using the same Avro schema.

Producing Avro Messages

Kafka Producer Configuration

To serialize data, the producer application must be configured to use an Avro serializer. Here's a simplified Java example using Apache Kafka and Apache Avro libraries directly:

java

1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
4props.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
5
6KafkaProducer<String, GenericRecord> producer = new KafkaProducer<>(props);
7
8Schema schema = new Schema.Parser().parse(new File("path/to/your-schema.avsc"));
9GenericRecord avroRecord = new GenericData.Record(schema);
10avroRecord.put("name", "John Doe");
11avroRecord.put("age", 25);
12
13ProducerRecord<String, GenericRecord> record = new ProducerRecord<>("your_topic", avroRecord);
14producer.send(record);
15producer.close();

In this example, you define the schema using the .avsc file then serialize the record into Avro format. However, notice that the serializer mentioned uses Confluent's serializers, which we want to avoid.

Instead, you can use the Apache Avro's native serializers, AvroSerializer, by possibly configuring a custom implementation or using available non-Confluent libraries.

Consuming Avro Messages

Consuming messages in Avro format from Kafka is a reverse process of producing them:

java

1Properties properties = new Properties();
2properties.put("bootstrap.servers", "localhost:9092");
3properties.put("group.id", "test-group");
4properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
5properties.put("value.deserializer", "io.confluent.kafka.serializers.KafkaAvroDeserializer");
6
7KafkaConsumer<String, GenericRecord> consumer = new KafkaConsumer<>(properties);
8consumer.subscribe(Collections.singletonList("your_topic"));
9
10while (true) {
11    ConsumerRecords<String, GenericRecord> records = consumer.poll(Duration.ofMillis(100));
12    for (ConsumerRecord<String, GenericRecord> record : records) {
13        GenericRecord received = record.value();
14        System.out.println("Received: " + received);
15    }
16}

Note that similar to the producer example, you should replace KafkaAvroDeserializer with the appropriate non-Confluent deserializer.

Use Cases

Avro is highly effective for:

Systems requiring high throughput and bandwidth efficiency.
Environments with evolving schemas where backward compatibility is crucial.

Challenges

Lack of a common schema store: Without Confluent's Schema Registry, managing schema evolution and ensuring compatibility may be complex and might need custom solutions.
Increased complexity in serialization and deserialization: Developers need to handle serializers and deserializers on their own.

Summary Table

Aspect	Description
Usage	Compact serialization in Kafka Streams
Pros	Efficient data encoding, interoperability
Cons	Requires explicit schema management
Tools	Apache Avro, Kafka native libraries
Challenges	Managing schema without Confluent Schema Registry

Overall, while Confluent offers additional components that facilitate the use of Avro with Kafka, using native Apache tools is entirely feasible. This approach benefits those seeking an open-source solution or needing customization beyond what Confluent provides.