Kafka DefaultPartitioner algorithm
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
Apache Kafka, a distributed streaming platform, utilizes partitioning to enhance scalability and fault-tolerance by distributing messages across different partitions within a topic. The method by which messages are distributed across these partitions is crucial for load balancing and can impact the performance and reliability of a Kafka-based system. One of the most commonly used partitioning strategies in Kafka is governed by the DefaultPartitioner, which plays a central role in how records are assigned to partitions.
Understanding the DefaultPartitioner
Kafka's DefaultPartitioner algorithm determines which partition data will be sent to when the partition is not specified in the producer record. The partition is selected based on the key of the Kafka record. If a record has a key, the partitioner will hash the key using a consistent hashing algorithm, and modulo the number of partitions to decide the target partition. If the record doesn’t have a key (i.e., the key is null), the partitioner will assign partitions in a round-robin fashion to balance the load evenly.
Key Components of the DefaultPartitioner:
- Key-Based Partitioning: If the key is present, a hash function is applied to the key and then modulo the total number of partitions. The
Utils.toPositive()method is used to ensure that the hash code is non-negative. - Round-Robin Partitioning: In the absence of a key, the DefaultPartitioner follows a round-robin approach. It maintains a sticky partition (last partition used) for each topic which changes whenever all partitions are cycled through, or if metadata about partitions changes.
Technical Workflow
The DefaultPartitioner follows these steps:
- Check for Key:
- If a
keyis present, compute the hash using theUtils.murmur2()hash function. - Use
Utils.toPositive()to get a positive hash value. - Take modulo of the positive hash with the number of partitions to determine the partition.
- Round-Robin Handling:
- If
keyis null, utilize the saved sticky partition. - If sticky partition counter exceeds the number of partitions (like after a partition count change), reset and start from the next available partition in a cyclical manner.
Example
Given partitions [0, 1, 2, 3], and assuming the current sticky partition for a topic is 1, behavior would look like:
- With Key (
key1):- Hash(
key1) → say,12345 12345 % 4→ partition1
- Without Key:
- Current sticky partition is
1, next used will be2.
Summary Table
| Aspect | Description |
| Keyed Partitioning | Uses a consistent hash of the key followed by modulo operation with the number of partitions. |
| No-Key Partitioning | Implements round-robin across partitions using a sticky partition mechanism to retain a temporary state. |
| Hash Function | Utilizes MurmurHash2 (by Utils.murmur2() in Kafka) for generating consistent hash values. |
| Load Balancing | Aimed to distribute messages evenly across partitions, either through hashed keys or round-robin for keyless messages. |
Conclusion
The DefaultPartitioner in Kafka is designed to efficiently distribute messages to various partitions either by hashing when a key is provided or rotating sequentially amongst available partitions in its absence. This ensures efficient load distribution and high availability—critical for the performance of Kafka applications.
Understanding and leveraging the behavior of the DefaultPartitioner can significantly enhance the design of a Kafka deployment, facilitating better data locality, load balancing, and system reliability.

