How does ConcurrentHashMap handle rehashing?
Master System Design with Codemia
Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.
ConcurrentHashMap is a powerful concurrent utility class in Java, which is part of the java.util.concurrent package. Unlike HashMap which is not thread-safe, ConcurrentHashMap is designed to provide high concurrency level and thread-safety without requiring synchronization through external locking mechanisms. The way ConcurrentHashMap handles rehashing is crucial to its performance and concurrency capabilities. This article explores how ConcurrentHashMap manages rehashing, offering technical insights and examples for a better understanding.
Fundamental Concepts of ConcurrentHashMap
Structure
ConcurrentHashMap segments the map into smaller sections known as "buckets" to allow multiple threads to read and write concurrently. Each bucket is essentially a small hash table, providing thread-safe access to its elements. In earlier versions of Java (up to Java 7), ConcurrentHashMap used a segment-based locking strategy to allow concurrent operations. From Java 8 onwards, it uses a different approach which leverages finer granularity using Node lists and tree-like structures similar to ConcurrentSkipListMap.
Rehashing in HashMap vs. ConcurrentHashMap
Rehashing is the process of resizing the map and redistributing the stored entries into the new bucket layout when the load factor threshold is reached. This is implemented differently in HashMap and ConcurrentHashMap:
- HashMap: During rehashing, a
HashMapcreates a new larger array and transfers all entries from the old array to the new one. This operation is not thread-safe and can lead to data corruption if theHashMapis accessed concurrently by multiple threads. - ConcurrentHashMap: Instead of using a single lock for the entire map or having to pause all map operations to perform a resize,
ConcurrentHashMaputilizes multiple locks for different buckets or nodes, allowing parallel processing of the rehashing operation.
How Rehashing Works in ConcurrentHashMap
To appreciate how ConcurrentHashMap handles rehashing, let's delve into its intricate mechanisms:
Resize Trigger
Rehashing in ConcurrentHashMap is triggered when the size of map entries exceeds a certain threshold, similar to the load factor concept in HashMap. The default load factor is 0.75, but instead of halting operations, ConcurrentHashMap expands the map's capacity dynamically with as little interruption as possible.
Rehash Function
In ConcurrentHashMap, each node in the table is a linked list or tree using Node or TreeNode classes respectively. When rehashing occurs, a concurrent process called transfer is run by the threads accessing the table. This is where the map's increased size is handled:
- The table is split into smaller tasks distributed among available threads.
- Each thread is responsible for transferring entries from the current bucket to the new locations based on recalculated hash values and increased table capacity.
- The use of per-node locking avoids bottlenecks that a single global lock could introduce.
Ensuring Consistency
During rehashing, ConcurrentHashMap guarantees consistency through a volatile snapshot of the table. Each table index maintains a reference to a node chain which helps in transferring easily without mismatches.
Progressive Rehashing
Java 8's ConcurrentHashMap introduces progressive resizing, where the table is expanded in stages using multiple threads in parallel. Each section of transfer happens at a manageable scope, and threads assist each other in transferring nodes to reduce latency. This method supports high throughput while maintaining thread safety.
Example Code
Below is a simplified illustration of how rehashing might be performed with a focus on node transfer:
- High Concurrency: The use of finer-grained locks allows high concurrency level and throughput, even during rehash operations.
- Minimal Synchronization Overhead: Progressive resizing reduces the management overhead of table expansion.
- Scalability: Efficient rehashing extends the scalability of applications utilizing
ConcurrentHashMap. - Complexity: The complexity of managing concurrent locks and synchronization can lead to intricate code that's hard to debug.
- Performance Trade-offs: While mostly performant, there can be some scenarios where the locking overhead might impact performance negatively, such as in scenarios with extremely high contention.

