Java 8
parallelStream
threading
concurrency
stream API

How many threads are spawned in parallelStream in Java 8?

Master System Design with Codemia

Enhance your system design skills with over 120 practice problems, detailed solutions, and hands-on exercises.

Java 8 introduced the concept of Stream, which provides a more functional approach to handling collections of data. One of the features of Java 8 streams is the ability to execute parallel streams, which can significantly improve performance for large datasets by utilizing multiple threads. Understanding how many threads are spawned when you use parallelStream() is crucial for optimizing your Java applications.

Understanding Parallel Streams in Java 8

parallelStream() is a method that allows you to process elements in a Stream concurrently. Internally, it makes use of the Fork/Join framework introduced in Java 7, which operates on a ForkJoinPool. This pool is responsible for distributing tasks across multiple threads.

How Many Threads Are Spawned?

By default, Java utilizes a common ForkJoinPool that has a target parallelism level of the number of available processors. This is determined at runtime with the Runtime.getRuntime().availableProcessors() method. Therefore, the number of threads spawned typically equals the number of CPU cores available on the host machine.

For example, on a quad-core machine, a parallelStream() will try to make use of 4 threads:

java
1List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
2
3// This operation may use up to 'Runtime.getRuntime().availableProcessors()' threads
4numbers.parallelStream().forEach(System.out::println);

Customizing Thread Count

While the default behavior relies on the common ForkJoinPool, there are ways to adjust the number of threads:

  1. Custom ForkJoinPool: You can define your own ForkJoinPool with a specified number of threads. Starting a stream in your custom pool will override the common pool's size.
java
    ForkJoinPool customPool = new ForkJoinPool(2); // Customized pool with 2 threads
    customPool.submit(() -> numbers.parallelStream().forEach(System.out::println)).join();
  1. System Property: You can also override the default parallelism level by setting the java.util.concurrent.ForkJoinPool.common.parallelism system property.
bash
    java -Djava.util.concurrent.ForkJoinPool.common.parallelism=2 YourApp

Considerations for Using parallelStream

There are several considerations when using parallel streams:

  • Task Granularity: Ensure that each operation is sufficiently granular to benefit from parallelization. Overhead from splitting tasks can negate the benefits if tasks are too trivial.
  • Side Effects: Minimize side effects inside parallel operations to prevent inconsistent or incorrect results.
  • Resource Constraints: Be aware of resource constraints common to parallel applications. For example, parallel streams may compete with other threads for CPU resources.

Example Use Case

Consider processing a large dataset to apply a complex transformation on each element. Without parallel processing:

java
List<Data> bigList = ...; // list with millions of records
bigList.stream().map(this::complexTransformation).forEach(System.out::println);

With parallel streams:

java
bigList.parallelStream().map(this::complexTransformation).forEach(System.out::println);

On a system with sufficient cores, the parallel version could considerably outperform the sequential version in execution time.

Summary Table

Parameter/ConceptDescription
Default Thread CountRuntime.getRuntime().availableProcessors()
Custom ForkJoinPoolCan specify a fixed number of threads with new ForkJoinPool(threadCount)
System PropertyChange default parallelism level with -Djava.util.concurrent.ForkJoinPool.common.parallelism=n
Suitable Use CasesComputationally intensive operations with minimal side effects
ConsiderationsTask granularity, resource management, potential for side effects

Conclusion

Using parallelStream() can significantly enhance performance by efficiently utilizing CPU resources, especially for CPU-bound tasks. However, it's essential to understand how the underlying threading model works to fully leverage its capabilities while avoiding potential pitfalls. Mastery over these elements can lead to improved application performance and responsiveness in multithreaded environments.


Course illustration
Course illustration

All Rights Reserved.