Kafka - Stream vs Topic
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka is a powerful distributed event streaming platform that is capable of handling trillions of events a day. It has fundamentally changed the way data is handled by enabling real-time data feeds. Kafka's capabilities hinge mainly around two primary concepts: topics and streams. Understanding these concepts and their applications is essential for effectively utilizing Kafka.
Kafka Topics
A topic is the core abstraction in Kafka's design. It is a category or feed name to which records are published and stored. Topics in Kafka are multi-subscriber; they can be consumed by one or multiple consumers. Physically, topics are distributed over several partitions and these partitions can be spread across multiple brokers or servers.
Key Characteristics of Topics
- Partitioned: Each topic can be split into multiple partitions. This allows for topics to be scaled horizontally by distributing these partitions across different brokers in a Kafka cluster.
- Immutable Sequence of Records: Each partition within a topic contains records in an immutable sequence. A record, once written to a partition, cannot be changed. It can only be superseded by newer records.
- Retention: Topics in Kafka retain records for a predefined time period, or based on other policies like space constraints. This means data can be re-read and reprocessed if needed.
Examples of Using Topics
Consider a use case where an e-commerce platform needs to process transactions. These transactions can be sent to a Kafka topic named transactions. Various systems such as billing, notifications, and auditing systems can consume these transactions independently and perform respective operations.
Kafka Streams
While topics are fundamental to storage and dissemination of records, Kafka Streams is a client library for building applications and microservices where the input and output data are stored in Kafka clusters. Kafka Streams simplifies the development of complex event-driven applications.
Key Features of Kafka Streams
- Time Windows: Kafka Streams supports windowing operations, which allow grouping of records based on time or session activity.
- Stateful Operations: Kafka Streams enables stateful operations like joins and aggregations on stream data. It maintains state in a fault-tolerant manner using internal Kafka topics.
- Exactly-once Processing Semantics: It supports transaction capabilities which allow for exactly-once processing semantics in the face of failures.
Example of Kafka Streams
Consider processing a continuous stream of user activity data from a social media app where we want to count the number of posts each user makes within a certain window. Kafka Streams can handle this in real-time, managing data flow, windowing, and state store.
Comparing Kafka Topics and Streams
While the concepts are interconnected, Kafka topics relate more to data storage in event-driven systems, whereas Kafka Streams focuses on the process logic on this data.
| Feature | Kafka Topic | Kafka Streams |
| Core Concept | Data storage and distribution | Data processing framework |
| Mode of operation | Store, publish and subscribe to records | Process and analyze stream data |
| Use Cases | Data capture, durable storage of events | Real-time analytics, microservices applications |
| Scalability | Scales by increasing partitions in the cluster | Scales per application demands by distributing load |
| Fault Tolerance | Manages replicates across brokers | Maintains state using internal topics |
Considerations for Using Kafka Streams and Topics Together
To leverage both Kafka topics and streams effectively, it’s essential to design Kafka topics with an understanding of the stream applications that will process them. For instance, partitioning strategy in topics can dramatically influence the performance of stateful applications in Kafka Streams.
Developing solutions with Kafka typically involves publishing data to Kafka topics, consuming topics in real-time with Kafka Streams, processing data, and possibly outputting results to other topics, databases, or systems.
Conclusion
Kafka topics and streams serve different but complementary functions within the Kafka ecosystem. Topics are fundamental to Kafka's ability to store and distribute data, making Kafka a robust data-backbone. Streams add a layer of agility and capability in processing that data in real-time, transforming Kafka from a messaging queue into a complete streaming platform.

