Java parallel stream is a stream API introduced in Java 8 that aims to make use of multiple cores for task execution. This is a very useful feature as it significantly reduces the time taken for the completion of a task but not in every case. In this article, we will be discussing what Java parallel stream is, how it is different from other streams, and what are the performance implications of using it.
Table of Contents
As CPUs with multiple cores are now extremely common, Java parallel stream is considered an excellent addition in Java 8 as it allows developers to utilize multiple cores present in their processor for tasks execution.
Normally, any Java code has a single stream of processing, where processes are executed one after another. In the case of using parallel streams, the code is now divided into multiple streams that will be executed simultaneously on separate cores.
The output from each core is later combined to get the final result. It also makes it significantly easier to iterate over collections as streams of data.
There are two ways to create a parallel stream in Java:
The parallel() method from the BaseStream interface is a void method that returns an equivalent parallel stream.
The code is given below to understand the working of the parallel() method. First, a file object is made that points to a text file that is already present. Now, to compare with, a simple sequential Stream is created that reads from the text file one line at a time. After that, the parallel() method is used to read the file.
The text file has the following text lines:
Apple Orange Banana Grapes
Now, see the code example below:
2
3
4
5
6
7
8
9
10
11
public static void main(String[] args) throws IOException {
// A File object is created
File textFile = new File(“C:DocumentsSample Text File.txt”);
// A Stream of String type is created here
Stream < String > textLine = Files.lines(textFile.toPath());
// Using StreamObject.parallel() to create parallel streams and forEach() to output each line.
textLine.parallel().forEach(System.out:: println);
textLine.close();
}
Following would be the output:
Orange Grapes Banana Apple
Here, you can see the change in order when a Java parallel stream is implemented which we will discuss later in the article.
The parallelStream() is a method from the Collection interface. It returns a possible parallel stream with the collection as the source. In the code below, again a parallel stream is used but here a List is used to read from the text file that is why we need the parallelStream() method.
File textFile = new File(“C:\Documents\List_Textfile.txt”);
List<String> text = Files.readAllLines(textFile.toPath());
// Using parallelStream() to create parallel streams
text.parallelStream().forEach(System.out::println);
A Java Parallel stream uses the fork-join framework and its common pool of worker threads to perform parallel executions. This framework was introduced in java.util.concurrent in Java 7 for task management between multiple threads. The fork-join framework performs the splitting of the source data among the worker threads and also handles the callback upon task completion.
There are 2 types of thread pools in Fork-Join Framework. These thread pools are responsible for the execution of small portions of a task assigned to them.
The number of threads in the common pool is always equal to the number of processor cores. Although, developers can also specify the number of threads. it can be done by passing a JVM parameter, mentioned below:
-D java.util.concurrent.ForkJoinPool.common.parallelism = 8
This is a global setting and it will affect all the parallel streams and all other fork-join tasks that use the common pool that is why it is not recommended to modify the number of threads in a common thread pool unless you have a very genuine reason for doing so.
As suggested by the name, a custom thread pool is customizable by the users to run a parallel stream in Java. Then again, it is highly recommended to stick with the common thread pool and custom thread pools should only be used in exceptional cases.
Parallel processing is beneficial in several cases. However, we have to take into account the following aspects that require additional work like division of task into the number of threads to be executed and merging the results.
Following are some of the very important points to determine whether you should be using a Java parallel stream in your code or sequential stream would be a better choice,
There are many scenarios where if you run a benchmark on both sequential and parallel operations, surprisingly, the parallel one would take significantly more time.
See this simple example of the reduction operation of an integer stream.
IntStream.rangeClosed(1, 200).parallel().reduce(0, Integer::sum);
In the example above, the parallel operation will take significantly more time. It is because sometimes the overhead of managing all the threads, the sources, and the results make the operation way more expensive.
Splitting the data source into even portions is a compulsory part of parallel execution. It seems very simple but various data types and data structures are utilized in a code and every data source does not split in the same manner with the same expense.
For instance, an array can be divided very cheaply and evenly. whereas, a LinkedList is way too expensive to divided due to its complexity. Some other common data structures like TreeMaps and HashMaps are still cheaper to split than a LinkedList, but not as cheap as arrays. Now, this is something that a developer has to decide based on the used data structures that Parallel stream is a better option or sequential.
After splitting the data source and their execution, their results are also needed to be combined later. For some operations, like reduction and addition, the merging operation does not require that much time but in some cases like merging the results from a set or map back together can be quite expensive.
Along with the introduction of parallel Java stream, Oracle introduced a simple model that can be helpful to determine whether parallelism can offer a performance advantage or not. In this model, N is the number of source data elements, while Q means the amount of computation performed per data element.
The larger the product of N and Q would be, the bigger boost in performance. For tasks with a very small amount of computation required (Q), like summing up numbers, the size of data (N) should be very big. As the number of computations increases, the data size required to get a boost in performance from parallelism decreases. If you cannot maintain it, then it is evident that you must avoid using a parallel stream in Java and go with the sequential.
Another very important aspect to consider is that the order of execution is not under the control of the developer when a Java parallel stream is used. In the initial example01, You must have observed that the order of the fruit names was different in parallel stream results as the list.parallelStream() works parallelly on multiple threads.
If you run that code several times, you will be getting a different order every time. The Java parallel stream prioritizes the performance so the order is not considered. In that example, the order was not important but let’s say if that text file would have contained ordered data, Parallel stream would not be a suitable option in that case.
Although You can use the forEachOrdered() method, instead of the forEach() to maintain order but it will greatly affect the performance, defeating the purpose of parallel execution.
See Also: Guide To The Most Important JVM Parameters
Using parallel Java streams can enhance the performance of your program by parallelism but going with parallel streams is not always the best choice. The implications we just discussed showed that there are certain instances where it is better to use the sequential streams for task execution at the cost of performance. You should only consider using a Java parallel stream when a sequential stream would behave poorly or performance is highly prioritized.
Shaharyar Lalani is a developer with a strong interest in business analysis, project management, and UX design. He writes and teaches extensively on themes current in the world of web and app development, especially in Java technology.
Create a free profile and find your next great opportunity.
Sign up and find a perfect match for your team.
Xperti vets skilled professionals with its unique talent-matching process.
Connect and engage with technology enthusiasts.
© Xperti.io All Rights Reserved
Privacy
Terms of use