I need to iterate on a List and for every item run a time-expensive operation and then collect its results to a map, something like this:
List<String> strings = ['foo', 'bar', 'baz']
Map<String, Object> result = strings.collectEntries { key ->
[key, expensiveOperation(key)]
}
So that then my result is something like
[foo: <an object>, bar: <another object>, baz: <another object>]
Since the operations i need to do are pretty long and don't depend on each other, I've been willing to investigate using GPars to run the loop in parallel.
However, GPars has a collectParallel method that loops through a collection in parallel and collects to a List but not a collectEntriesParallel that collects to a Map: what's the correct way to do this with GPars?
There is no
collectEntriesParallelbecause it would have to produce the same result as:as Tim mentioned in the comment. It's hard to make reducing list of values to map (or any other mutable container) in a deterministic way other than collecting results to a list in parallel and in the end collecting to map entries in a sequential manner. Consider following sequential example:
In this example we are using
Collection.inject(initialValue, closure)which is an equivalent of good old "fold left" operation - it starts with initial value[:]and iterates over all values and adds them as key and value to initial map. Sequential execution in this case takes approximately 3 seconds (eachexpensiveOperation()sleeps for 1 second).Console output:
And this is basically what
collectEntries()does - it's kind of reduction operation where initial value is an empty map.Now let's see what happens if we try to parallelize it - instead of
injectwe will useinjectParallelmethod:Let's see what is the result:
As you can see parallel version of
injectdoes not care about the order (which is expected) and e.g. first thread receivedfooas aseedvariable andbaras a key. This is what could happen if reduction to a map (or any mutable object) was performed in parallel and without specific order.Solution
There are two ways to parallelize the process:
1.
collectParallel+collectEntriescombinationAs Tim Yates mentioned in the comment you can parallel expensive operation execution and in the end collect results to a map sequentially:
This example executes in approximately 1 second and produces following output:
2. Java's parallel stream
Alternatively you can use Java's parallel stream with
Collectors.toMap()reducer function:This example also executes in approximately 1 second and produces output like that:
Hope it helps.