Deserializing a kafka message without schema registry

Kafka

Deserialization

Schema Registry

Message Processing

Data Manipulation

Deserializing a kafka message without schema registry

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since Kafka is typically used to handle streams of data, handling these data efficiently and accurately is crucial. Serialization and deserialization play central roles in how these data are structured and interpreted across systems that produce and consume these events.

Understanding Serialization in Kafka

Serialization is the process of converting an object into a byte stream, thus making it easy to store, transmit, and recover the original object's state. In Kafka, producers serialize messages before they send them to Kafka brokers. Upon receiving these messages, consumers deserialize them to regain the original message content.

The Role of Schema Registry

Typically, in Kafka environments that handle message serialization and deserialization, a schema registry is used. A Schema Registry stores a versioned history of all schemas used in the Kafka ecosystem, along with associated metadata including references to subjects. It serves several purposes:

Provides a serving layer for your metadata.
Enables schema evolution, allowing for controlled message schema changes.
Ensures compatibility between producer and consumer by enforcing rules and checks.

Deserializing Without Schema Registry

Deserializing messages without a schema registry involves a clear understanding of the message structure beforehand. Each message must adhere to a predefined schema known to both the producer and the consumer. Here’s how it generally works:

1. Fixed Message Format

Without a schema registry, the most straightforward way to handle serialization and deserialization is to agree upon a fixed message format and structure. This requires tight coordination between teams and is less flexible to changes.

2. JSON or XML

Common data interchange formats like JSON or XML can be used. These formats are self-describing, which means they include metadata about the data structure. Here’s a simple Kafka consumer example in Java that deserializes JSON messages:

java

1Properties props = new Properties();
2props.put("bootstrap.servers", "localhost:9092");
3props.put("group.id", "test-group");
4props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
5props.put("value.deserializer", "org.apache.kafka.connect.json.JsonDeserializer");
6
7KafkaConsumer<String, JsonNode> consumer = new KafkaConsumer<>(props);
8
9consumer.subscribe(Arrays.asList("topic_name"));
10while (true) {
11    ConsumerRecords<String, JsonNode> records = consumer.poll(Duration.ofMillis(100));
12    for (ConsumerRecord<String, JsonNode> record : records) {
13        JsonNode jsonNode = record.value();
14        System.out.println(jsonNode.toString());
15    }
16}

In this example, each message fetched from the Kafka topic is expected to be in JSON format.

When to Avoid Schema Registry?

Using a schema registry is generally recommended for most Kafka implementations, but there are scenarios where bypassing it might make sense:

1. Small Scale Systems

In smaller, less complex systems where there are fewer schemas to manage, or where schemas rarely or never change.

2. Internal Communication

When the system is used internally (i.e., not exposed to external clients), and you can guarantee that the producers and consumers are always aligned on the schema.

3. Resource Constraints

In resource-sensitive environments where you want to avoid the overhead of running a separate service.

Summary of Key Points

Aspect	With Schema Registry	Without Schema Registry
Schema Management	Centralized and managed	Manual and decentralized
Compatibility Checks	Automatic to ensure alignment	Requires careful coordination
Flexibility	High, via schema evolution	Low, any change needs manual handling
Overhead	Requires additional resources	No additional resources needed

Conclusion

While Kafka's primary design leans towards decoupling producers and consumers using a Schema Registry, there are valid situations and methods to deserialize messages without one. It requires thorough planning and more rigorous control of message formats, but it can simplify the architecture for specific scenarios.