GlobalKTable
ReKey
Kafka Streams
Data Processing
Stream Processing

Is it possible to ReKey a GlobalKTable?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

In Apache Kafka Streams, a GlobalKTable is a fundamental abstraction that represents a fully replicated, read-only table. The data in a GlobalKTable is usually sourced from Kafka topics and global in the sense that every data record from a source topic is available at each instance of an application, unlike KTable that only maintains data which is local to a specific partition.

Understanding GlobalKTable

Before diving into whether a GlobalKTable can be rekeyed, it's important to understand its properties and use cases. Unlike a KTable, which represents a changelog stream from Kafka topics and maps directly to a particular Kafka partition, a GlobalKTable loads the entire dataset from a topic into each application's local state. This allows a GlobalKTable to be joined with a KStream without regard to data co-location.

Can You ReKey a GlobalKTable?

Simply put, rekeying, in the context of Kafka Streams, refers to changing the key that is associated with a particular record as it passes through the streaming topology. Now, can we rekey a GlobalKTable? The straightforward answer is, no, directly rekeying a GlobalKTable is not supported in Kafka Streams.

Reasons for the Limitation

  • Global Consistency: Since GlobalKTable is meant to represent consistent and static viewpoints of a dataset across all instances, altering keys might lead to inconsistencies across different application instances.
  • Read-Only Nature: GlobalKTables are inherently read-only. They are not designed to modify the data they have loaded but to provide a global, consistent view of the data to all stream processors.

Workarounds for Rekeying Needs

While you cannot directly rekey a GlobalKTable, you can manipulate data upstream before it is loaded into the GlobalKTable or choose to handle any modifications in the application logic post-join or processing. Here are a couple of strategies:

  1. Pre-Processing During Ingestion: Modify the key of records as they are produced to the Kafka topic that backs the GlobalKTable. This way, the GlobalKTable is built with the desired key structure from the outset.
  2. Post-Join Transformation: If you need to join a KStream (which can be rekeyed at any point) with a GlobalKTable and then wish to modify keys, perform the rekey operation on the KStream either before or after the join, based on your business logic. This would mean handling rekeying in the processing logic rather than trying to alter the GlobalKTable.

Example of Handling Rekey with KStream

Here's a simple example where we rekey data after joining a KStream and a GlobalKTable:

java
1StreamsBuilder builder = new StreamsBuilder();
2GlobalKTable<String, String> globalKTable = builder.globalTable("global-topic");
3KStream<String, String> stream = builder.stream("source-topic");
4
5// Join KStream with GlobalKTable
6KStream<String, String> joined = stream.join(globalKTable,
7    (streamKey, streamValue) -> streamKey,  // Key-mapping function
8    (streamValue, globalValue) -> streamValue + ":" + globalValue  // Value-joining function
9);
10
11// Rekey the joined KStream
12KStream<String, String> rekeyed = joined.selectKey((key, value) -> value.split(":")[1]);

Summary Table

Here's a summary of key points about GlobalKTable and rekeying:

FeatureDescription
Global DataData in GlobalKTable is available globally across all application instances.
Rekeying DirectlyNot supported due to the read-only and globally consistent nature of GlobalKTable.
WorkaroundsRekeying can be managed either upstream (data production) or downstream (post-join processing).

Conclusion

While directly rekeying a GlobalKTable is not supported in Kafka Streams, understanding how to manipulate data streams either before ingestion into the GlobalKTable or after joining can provide flexible solutions to meet your application needs. Leveraging Apache Kafka's powerful streaming API, particularly the robust capabilities of KStream, allows overcoming this limitation effectively.


Course illustration
Course illustration