Producing and Consuming Avro messages from Kafka without Confluent components
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a widely-used distributed streaming platform which specializes in handling real-time data feeds. Apache Avro, on the other hand, is a serialization framework that can be used to encode the data structure in a way that is compact, fast and interoperable with other languages like Java, Python, and C. This dual integration can be efficiently utilized without relying on Confluent components, which is beneficial for those who prefer an open-source or are restrained by Confluent's licensing model.
Understanding Apache Avro
Avro uses a JSON format to define the structure of data (schema) which is used to serialize and deserialize the data. A typical Avro data schema defines a type for each record field, each type can be a simple type like a string or int or a complex type like an array or map.
How Avro Works with Kafka
When integrating Avro with Kafka data streams, a key consideration is how Avro data is serialized and deserialized efficiently for real-time processing. The usual setup involves:
- Defining the Avro schema.
- Serializing the producer messages to bytes using Avro.
- Sending these bytes into Kafka topics.
- Deserializing the messages from Kafka using the same Avro schema.
Producing Avro Messages
Kafka Producer Configuration
To serialize data, the producer application must be configured to use an Avro serializer. Here's a simplified Java example using Apache Kafka and Apache Avro libraries directly:
In this example, you define the schema using the .avsc file then serialize the record into Avro format. However, notice that the serializer mentioned uses Confluent's serializers, which we want to avoid.
Instead, you can use the Apache Avro's native serializers, AvroSerializer, by possibly configuring a custom implementation or using available non-Confluent libraries.
Consuming Avro Messages
Consuming messages in Avro format from Kafka is a reverse process of producing them:
Note that similar to the producer example, you should replace KafkaAvroDeserializer with the appropriate non-Confluent deserializer.
Use Cases
Avro is highly effective for:
- Systems requiring high throughput and bandwidth efficiency.
- Environments with evolving schemas where backward compatibility is crucial.
Challenges
- Lack of a common schema store: Without Confluent's Schema Registry, managing schema evolution and ensuring compatibility may be complex and might need custom solutions.
- Increased complexity in serialization and deserialization: Developers need to handle serializers and deserializers on their own.
Summary Table
| Aspect | Description |
| Usage | Compact serialization in Kafka Streams |
| Pros | Efficient data encoding, interoperability |
| Cons | Requires explicit schema management |
| Tools | Apache Avro, Kafka native libraries |
| Challenges | Managing schema without Confluent Schema Registry |
Overall, while Confluent offers additional components that facilitate the use of Avro with Kafka, using native Apache tools is entirely feasible. This approach benefits those seeking an open-source solution or needing customization beyond what Confluent provides.

