I am doing the Oracle's Stream API Java 1.8 course and I stumbled upon this while perusing the lecture notes:
unordered():
– Inherited from BaseStream
– Returns a stream that is unordered (used internally)
– Can improve efficiency of operations like distinct() and groupingBy()
Here is my question. How can the property of being unordered lead to more efficient calculation of distinct()
and groupingBy()
It only has significance in case of parallel streams. In case of ordered parallel streams, the
distinct()
operation has to do extra work in order to keep it's stability guarantee, that is,(see the API Note part in the javadoc for Stream.distinct().
In case of unordered parallel streams, no such guarantee needs to be kept, as the stream is already unordered. This way, removing the ordered characteristic from an ordered parallel stream can greatly improve the performance of the
distinct()
operation.Similarly, for the
groupingBy()
operation, lifting the requirement that the stream order should be preserved can greatly improve the efficiency of the operation in case of parallel streams as the reduction itself can be performed concurrently. Note that this will only happen while collecting from parallel streams with concurrent collectors, with either the collector or the stream itself being unordered. Practically, you'll need to useStream.collect(groupingByConcurrent(..))
instead ofStream.collect(groupingBy(..))
. See the javadoc forStream.collect()
andCollector
for more details.