Kafka Streams - KTable from topic with retention policy
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka Streams is a client library for building applications and microservices where the input and output data are stored in Kafka clusters. It allows you to build complex stream processing applications that are scalable, elastic, and fully integrated with Kafka. One of the core abstractions of Kafka Streams is the KTable.
What is a KTable?
A KTable represents a changelog stream from Kafka, where each data record represents a change. It models a table of aggregated data. Each record in a KTable is an update to the previous value related to a specific key. This characteristic makes KTable different from a KStream, which represents an unbounded stream of individual records.
Kafka Topic with Retention Policy
When a KTable is backed by a Kafka topic, this topic acts as a persistence layer and also influences how data is retained. Kafka topics have configurable retention policies that can impact the behavior of the KTable:
- Retention Time: Defines how long records are retained in a topic before being deleted.
- Retention Size: Determines the maximum size a log can grow to before old data is discarded.
- Compaction: Kafka's log compaction feature ensures that the log contains at least the last known value for each key. This is particularly useful for
KTableas it pertains to maintaining a state.
Creating a KTable from a Topic
To create a KTable from a Kafka topic, you typically use Kafka Streams’ DSL (Domain-Specific Language). Here’s a simple example in Java:
In this code, input-topic would be the name of the Kafka topic from which the KTable is created. The key and value types of the topic need to be provided if they are not String.
Example Use-Case: User Configuration Data
Imagine a scenario where user configuration is stored in a Kafka topic, and each record in the topic represents a user’s latest configuration. If the topic uses log compaction and a suitable retention policy, a KTable can be an ideal way to access the most current configuration for any user at any time.
Processing Data with a KTable
Kafka Streams allows you to perform various operations on a KTable:
- Filtering: You can create a new
KTableby filtering records in the originalKTable. - Joining:
KTablecan be joined with anotherKTableor aKStream. - Aggregations: Operations like count, sum, min, max, etc., can be performed on a
KTable.
Here is an example where we count the number of times each key appears in a KTable:
Advantages of KTable for Stateful Operations
Because KTable represents each key's latest value, it is particularly useful when you need to maintain state over time. For example, in an e-commerce application, you might use a KTable to keep track of the latest state of a user’s shopping cart.
Summary of Key Concepts
| Concept | Description | Application Example |
KTable | Represents a Kafka-backed changelog stream | Tracking the most recent state of data |
| Retention Policy | Controls how long or how much data to retain in a topic | Clean-up or storage management |
| Compaction | Retains only the latest value for each key in a topic | Useful for stateful applications like user settings |
Conclusion
Understanding KTable and how it operates with Kafka topics and retention policies is crucial for developing robust streaming applications with Apache Kafka. By leveraging the characteristics and operations available through KTable, developers can create efficient and meaningful real-time data processing applications that are horizontally scalable and fault-tolerant.

