Can compacted Kafka topic be used as key-value database?

Kafka Topic

Key-Value Database

Database Management

Data Storage

Compacted Kafka

Can compacted Kafka topic be used as key-value database?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Start Practicing Learn More

Apache Kafka is an open-source stream-processing software platform developed by the Apache Software Foundation, designed to handle real-time data feeds. Kafka functions on a publish-subscribe basis and can process streams of data in real-time. Traditionally known for messaging, it also provides functionalities that allow it to be used as a storage system, and specifically, its compacted topics can enable it to function similar to a key-value database.

What is a Compacted Kafka Topic?

In Kafka, a topic is a category or feed name to which records are published. Topics in Kafka are multi-subscriber; they can maintain multiple consumers that subscribe to the data written to it. By default, Kafka topics are append-only, meaning data is only written to the end of a log. However, Kafka also supports log compaction on topics which provides a way for a topic to be used more like a key-value store.

Log compaction ensures that Kafka will save at least the last known value for each key within the compacted topic. Older records with the same key are removed periodically. This feature is particularly useful for restoring state or maintaining a current state table.

How Does Log Compaction Work?

Compaction works by retaining only the latest update for each key in the topic log. It does this by marking older record entries for deletion only if there is at least one newer entry with the same key. These marked records are eventually cleared away by the compactor thread in Kafka. The log compactor ensures that at least the minimum set of data necessary to reconstruct the full dataset remains.

Usage as a Key-Value Store

A compacted topic can be thought of as a key-value store with Kafka acting as the storage layer. Each record in the topic represents a "key-value" pair, where the record key acts as the key, and the record value represents the storage value.

java

1// Simple Producer example using Kafka's Java API
2Properties props = new Properties();
3props.put("bootstrap.servers", "localhost:9092");
4props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
5props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
6
7Producer<String, String> producer = new KafkaProducer<>(props);
8producer.send(new ProducerRecord<String, String>("compacted-topic", "Key1", "Value1"));
9producer.close();

Benefits of Using Compact Topics as a Key-Value Store

Scalability: Kafka is built to handle large volumes of data and high throughput, which makes it suitable for scenarios where large scale is needed.
Fault Tolerance: As a distributed system, Kafka replicates data and can recover from node failures, providing a higher degree of reliability.
Performance: Kafka can provide high performance read and write access, especially beneficial for scenarios requiring fast updates and retrievals.

Limitations

Eventual Consistency: Kafka guarantees eventual consistency but does not offer the same immediate consistency as some dedicated key-value stores.
API Limitations: Kafka does not offer the rich query capabilities that some databases provide. The APIs are primarily focused on stream processing.
Operational Complexity: Kafka requires more setup and management than typical key-value stores.

Summary Table

Feature	Kafka as Key-Value Store
Data Model	Key-value pairs
Write Performance	High
Read Performance	High (for latest values)
Consistency	Eventual
Query Capabilities	Limited to keys
Scalability	Very High
Fault Tolerance	High
Operational Complexity	Higher than KV stores

Conclusion

While Kafka was not primarily designed as a key-value store, its compacted topics offer a powerful way to use it in such a capacity, especially useful for scenarios involving large amounts of data that need to be quickly written and read. However, it is essential to understand the trade-offs in consistency and operational complexity. For many use cases, this can be a suitable and powerful solution, but for others, more traditional key-value stores might be more appropriate.