Apache Kafka
Message Delivery
Data Streaming
Technology
Software Architecture

Apache Kafka and message delivery assurance

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Apache Kafka is an open-source distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a message queue, Kafka is based on an abstraction of a distributed commit log. Since its initial development at LinkedIn, Kafka has grown to be used by thousands of companies including major technology companies such as Netflix, Uber, and LinkedIn themselves.

Technical Overview of Apache Kafka

Kafka enables low-latency data feeds driven by real-time analytics. Its architecture is designed around the concept of topic log partitions, where each topic is a configurable category of messages that can be partitioned and replicated across multiple nodes in a Kafka cluster. This design supports high-throughput and scalable message delivery, critical for both big data and traditional enterprise use cases.

Message Delivery Semantics in Kafka

Kafka offers three core message delivery semantics that can be configured to balance between performance and data assurance:

  1. At most once - Messages may be lost but will not be redelivered.
  2. At least once - Messages are never lost but may be redelivered in case of rebalancing or a failure. This is usually preferred, as it ensures no data loss.
  3. Exactly once - Messages are delivered exactly once. This is critical for scenarios where neither data loss nor data duplication is acceptable.

These assurances are made possible by Kafka's robust architecture, including replication, partitioning, and consumer groups together with message offset tracking.

Consumer Groups and Message Offset

Consumers in Kafka are organized into consumer groups, which are responsible for consuming records from one or more topics. Each consumer within the group reads from exclusive partitions of the topic, and the group as a whole covers all partitions of each topic they subscribe to. Kafka delivers messages to consumers and also tracks which messages have been consumed by maintaining a numeric offset for each consumer group.

Understanding Kafka's Message Delivery with Examples

Consider a Kafka setup with a producer, a broker, and a consumer:

  • Producer: Pushes records to a topic.
  • Broker: Manages storage and distribution of records.
  • Consumer: Subscribes to a topic and consumes records.

Example Scenario:

  1. Producer sends message: A producer sends a message "Hello, world!" to a topic named 'Greetings'.
  2. Broker receives and stores the message: Depending on the setup, the broker might store multiple copies of the message across several partitions to ensure resilience and high availability.
  3. Consumer processes the message: A consumer from a subscribing consumer group retrieves and processes the message.

If the consumer fails after processing but before committing its offset, the message could be read again when the consumer restarts, leading to a message being processed twice under 'at least once' delivery semantics.

Kafka's Exactly Once Processing

In Apache Kafka version 0.11 and later, exactly-once semantics (EOS) for message production and consumption were introduced. This feature can be enabled by setting the transactional IDs for producers and enabling idempotence, which ensures that messages are neither lost nor seen more than once by the consumer, even in the event of a retry. This is achieved through precise coordination between the broker and the producer or consumer employing a two-phase commit protocol.

Summary Table of Key Points

FeatureDescription
Topic PartitioningEnhances parallelism and scalability by distributing messages across different nodes in the cluster.
ReplicationEnsures data durability and high availability.
Consumer GroupsAllows multiple consumers to read from a single topic simultaneously, enhancing throughput.
Message OffsetKafka tracks the read status of each message per consumer group to manage message delivery.
Delivery SemanticsProvides configurations for different levels of message delivery assurance.
Exactly Once DeliveryGuarantees each message is delivered exactly once, crucial for transactional data integrity.

Conclusion

Apache Kafka's ability to provide different levels of message delivery assurance makes it a versatile tool for building robust, scalable, and fault-tolerant streaming applications. Each level of delivery assurance serves different use cases and requirements, allowing developers and organizations to choose according to their specific needs for data accuracy and processing overhead. Whether it's at most once, at least once, or exactly once, Kafka offers tools and configurations to accommodate a variety of scenarios, making it a staple in many modern data-driven architectures.


Course illustration
Course illustration

All Rights Reserved.