I've been reading a nice answer to Difference between reduce and foldLeft/fold in functional programming (particularly Scala and Scala APIs)? provided by samthebest and I am not sure if I understand all the details:
According to the answer (
reduce
vsfoldLeft
):A big big difference (...) is that reduce should be given a commutative monoid, (...)
This distinction is very important for Big Data / MPP / distributed computing, and the entire reason why reduce even exists.
and
Reduce is defined formally as part of the MapReduce paradigm,
I am not sure how this two statements combine. Can anyone put some light on that?
I tested different collections and I haven't seen performance difference between
reduce
andfoldLeft
. It looks likeParSeq
is a special case, is that right?Do we really need order to define
fold
?we cannot define fold because chunks do not have an ordering and fold only requires associativity, not commutativity.
Why it couldn't be generalized to unordered collection?
As mentioned in the comments, the term reduce means different thing when used in the context of MapReduce and when used in the context of functional programming.
In MapReduce, the system groups the results of the
map
function by a given key and then calls thereduce
operation to aggregate values for each group (soreduce
is called once for each group). You can see it as a function(K, [V]) -> R
taking the group keyK
together with all the values belonging to the group[V]
and producing some result.In functional programming,
reduce
is a function that aggregates elements of some collection when you give it an operation that can combine two elements. In other words, you define a function(V, V) -> V
and thereduce
function uses it to aggregate a collection[V]
into a single valueV
.When you want to add numbers
[1,2,3,4]
using+
as the function, thereduce
function can do it in a number of ways:((1+2)+3)+4)
a = 1+2
andb = 3+4
in parallel and then adda+b
!The
foldLeft
operation is, by definition always proceeding from the left and so it always uses the evaluation strategy of (1). In fact, it also takes an initial value, so it evaluates something more like(((0+1)+2)+3)+4)
. This makesfoldLeft
useful for operations where the order matters, but it also means that it cannot be implemented for unordered collections (because you do not know what "left" is).