Introduction

Java parallel stream is a stream API introduced in Java 8 that aims to make use of multiple cores for task execution. This is a very useful feature as it significantly reduces the time taken for the completion of a task but not in every case. In this article, we will be discussing what Java parallel stream is, how it is different from other streams, and what are the performance implications of using it.

 

Java parallel stream

As CPUs with multiple cores are now extremely common, Java parallel stream is considered an excellent addition in Java 8 as it allows developers to utilize multiple cores present in their processor for tasks execution.

Normally, any Java code has a single stream of processing, where processes are executed one after another. In the case of using parallel streams, the code is now divided into multiple streams that will be executed simultaneously on separate cores.

The output from each core is later combined to get the final result. It also makes it significantly easier to iterate over collections as streams of data.

 

How to create a Java parallel stream

There are two ways to create a parallel stream in Java:

1. Using parallel() method on a stream

The parallel() method from the BaseStream interface is a void method that returns an equivalent parallel stream.

The code is given below to understand the working of the parallel() method. First, a file object is made that points to a text file that is already present. Now, to compare with, a simple sequential Stream is created that reads from the text file one line at a time. After that, the parallel() method is used to read the file.

 

The text file has the following text lines:

Apple
Orange
Banana
Grapes

 

Now, see the code example below:

1

2

3

4

5

6

7

8

9

10

11

public class ParallelStreamExample01 {

public static void main(String[] args) throws IOException {

// A File object is created

File textFile = new File(“C:\Documents\Sample Text File.txt”);

// A Stream of String type is created here

Stream < String > textLine = Files.lines(textFile.toPath());

// Using StreamObject.parallel() to create parallel streams and forEach() to output each line.

textLine.parallel().forEach(System.out:: println);

textLine.close();

}

}

 

Following would be the output:

  • Output from a sequential stream:
Apple
Orange
Banana
Grapes
  • Output from a parallel stream:
Orange
Grapes
Banana
Apple


Here, you can see the change in order when a Java parallel stream is implemented which we will discuss later in the article.

 

2. Using parallelStream() method on a collection

The parallelStream() is a method from the Collection interface. It returns a possible parallel stream with the collection as the source. In the code below, again a parallel stream is used but here a List is used to read from the text file that is why we need the parallelStream() method.

1

2

3

4

5

6

7

8

public class ParallelStreamExample02 {

public static void main(String[] args) throws IOException {

File textFile = new File(“C:\\Documents\\List_Textfile.txt”);

List<String> text = Files.readAllLines(textFile.toPath());

// Using parallelStream() to create parallel streams

text.parallelStream().forEach(System.out::println);

}

}

Fork-Join framework

A Java Parallel stream uses the fork-join framework and its common pool of worker threads to perform parallel executions. This framework was introduced in java.util.concurrent in Java 7 for task management between multiple threads. The fork-join framework performs the splitting of the source data among the worker threads and also handles the callback upon task completion.

Types of thread pools

There are 2 types of thread pools in Fork-Join Framework. These thread pools are responsible for the execution of small portions of a task assigned to them.

 

1. Common thread pool

The number of threads in the common pool is always equal to the number of processor cores. Although, developers can also specify the number of threads. it can be done by passing a JVM parameter, mentioned below:

-D java.util.concurrent.ForkJoinPool.common.parallelism = 8

This is a global setting and it will affect all the parallel streams and all other fork-join tasks that use the common pool that is why it is not recommended to modify the number of threads in a common thread pool unless you have a very genuine reason for doing so.

 

2. Custom thread pool

As suggested by the name, a custom thread pool is customizable by the users to run a parallel stream in Java. Then again, it is highly recommended to stick with the common thread pool and custom thread pools should only be used in exceptional cases.

 

Things to consider before using Java parallel stream

Parallel processing is beneficial in several cases. However, we have to take into account the following aspects that require additional work like division of task into the number of threads to be executed and merging the results.

Following are some of the very important points to determine whether you should be using a Java parallel stream in your code or sequential stream would be a better choice,

 

1. The overhead

There are many scenarios where if you run a benchmark on both sequential and parallel operations, surprisingly, the parallel one would take significantly more time.

See this simple example of the reduction operation of an integer stream.

IntStream.rangeClosed(1, 200).reduce(0, Integer::sum);

IntStream.rangeClosed(1, 200).parallel().reduce(0, Integer::sum);

In the example above, the parallel operation will take significantly more time. It is because sometimes the overhead of managing all the threads, the sources, and the results make the operation way more expensive.

 

2. The cost of splitting the source

Splitting the data source into even portions is a compulsory part of parallel execution. It seems very simple but various data types and data structures are utilized in a code and every data source does not split in the same manner with the same expense.

For instance, an array can be divided very cheaply and evenly. whereas, a LinkedList is way too expensive to divided due to its complexity. Some other common data structures like TreeMaps and HashMaps are still cheaper to split than a LinkedList, but not as cheap as arrays. Now, this is something that a developer has to decide based on the used data structures that Parallel stream is a better option or sequential.

 

3. The cost of merging the final results

After splitting the data source and their execution, their results are also needed to be combined later. For some operations, like reduction and addition, the merging operation does not require that much time but in some cases like merging the results from a set or map back together can be quite expensive.

 

4. The N*Q model

Along with the introduction of parallel Java stream, Oracle introduced a simple model that can be helpful to determine whether parallelism can offer a performance advantage or not. In this model, N is the number of source data elements, while Q means the amount of computation performed per data element.

The larger the product of N and Q would be, the bigger boost in performance. For tasks with a very small amount of computation required (Q), like summing up numbers, the size of data (N) should be very big. As the number of computations increases, the data size required to get a boost in performance from parallelism decreases. If you cannot maintain it, then it is evident that you must avoid using a parallel stream in Java and go with the sequential.

 

5. Order of execution

Another very important aspect to consider is that the order of execution is not under the control of the developer when a Java parallel stream is used. In the initial example01, You must have observed that the order of the fruit names was different in parallel stream results as the list.parallelStream() works parallelly on multiple threads.

If you run that code several times, you will be getting a different order every time. The Java parallel stream prioritizes the performance so the order is not considered. In that example, the order was not important but let’s say if that text file would have contained ordered data, Parallel stream would not be a suitable option in that case.

Although You can use the forEachOrdered() method, instead of the forEach() to maintain order but it will greatly affect the performance, defeating the purpose of parallel execution.

See Also: Guide To The Most Important JVM Parameters

 

Conclusion

Using parallel Java streams can enhance the performance of your program by parallelism but going with parallel streams is not always the best choice. The implications we just discussed showed that there are certain instances where it is better to use the sequential streams for task execution at the cost of performance. You should only consider using a Java parallel stream when a sequential stream would behave poorly or performance is highly prioritized.

Author

Shaharyar Lalani is a developer with a strong interest in business analysis, project management, and UX design. He writes and teaches extensively on themes current in the world of web and app development, especially in Java technology.

Write A Comment