What happens in Kafka when partitions are reassigned (esp. logsizes)?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
In Apache Kafka, the process of partition reassignment refers to the redistribution of the partitions and their respective replicas across the available brokers in a Kafka cluster. This is crucial in scenarios like adding new brokers, decommissioning existing ones, or optimizing the balance of the load across the cluster. Here, we’ll delve into what happens during and after the partition reassignment, with a special focus on the impact on log sizes and overall data management.
Understanding Partition Reassignment
When Kafka partitions are reassigned, several operational steps and configurations are involved:
- Triggering Reassignment: This is generally performed by an administrator who uses Kafka's admin tools or through an automatic rebalance strategy. The reassignment can be specified using a JSON file that details which topics and partitions should be moved to which brokers.
- Implementation: The reassignment process is managed internally by Kafka. Kafka tries to achieve minimal impact on the cluster performance during this operation.
- Data Movement: As partitions are moved, the data contained within those partitions (the actual messages) must also be relocated to the new brokers. This involves copying log segments from one broker to another.
- Replica Synchronization: New replicas need to be fully synced with the current leaders to ensure data consistency and availability.
Log Size Management
During the reassignment of partitions, one critical aspect to consider is the management of log sizes which are essentially the sizes of the log files that store the actual messages. Here’s how log sizes are affected:
- Increased Network and Disk I/O: As the partitions are moved and replicas are created on different brokers, there is significant network traffic and disk I/O. This results from brokers replicating data. The log size on the source broker will gradually decrease while it increases on the destination broker.
- Temporarily Increased Storage Requirement: Until the old replicas are deleted after the newer copies are acknowledged as accurate and are up-to-date, there will be a duplication of log data, temporarily spiking the storage requirement.
- Cleanup and Compaction: Kafka employs strategies like log compaction and cleanup policies (
deleteorcompact) to manage log sizes efficiently. Post-reassignment, these policies help in reclaiming space by removing obsolete and redundant data entries.
Example: Partition Reassignment
Let’s illustrate with an example. Assume a Kafka cluster with three brokers (Broker A, B, and C) and a topic “X” with two partitions (0 and 1).
- Initial State:
- Partition 0 is on Broker A.
- Partition 1 is on Broker B.
- After Reassignment:
- Partition 0 moves to Broker B.
- Partition 1 moves to Broker A.
This change results in the movement of log files for Partition 0 from Broker A to Broker B and vice versa for Partition 1. Throughout this, Kafka coordinates the record transfer and sync to ensure no data is lost.
Table Summary
| Aspect | Description |
| Triggering Reassignment | Initiated via admin tools or scripts using configurations provided in JSON format. |
| Data Movement | Log segments of partitions are physically copied to destination brokers. |
| Log Size Impact | Temporary increase in log sizes on both source and destination brokers during the syncing phase. |
| Cleanup Policies | Employed post reassignment to manage and optimize log storage. |
Additional Considerations
- Performance Impact: During reassignment, since there is extra load on brokers (both source and destination), there might be a temporary performance degradation.
- Fault Tolerance: Care must be taken that during the reassignment process, fault tolerance is not compromised. Kafka ensures this through its replica synchronization protocol.
- Monitoring and Management: It's crucial to monitor broker load and log directory sizes during and after the reassignment to ensure the cluster returns to a stable state without excessive load on any broker.
Partition reassignment in Kafka is a powerful feature for managing cluster scalability and performance but requires careful handling and monitoring, particularly in relation to log sizes and system resources.

