Difference between fold and reduce revisted

2.1k views Asked by At

I've been reading a nice answer to Difference between reduce and foldLeft/fold in functional programming (particularly Scala and Scala APIs)? provided by samthebest and I am not sure if I understand all the details:

  • According to the answer (reduce vs foldLeft):

    A big big difference (...) is that reduce should be given a commutative monoid, (...)

    This distinction is very important for Big Data / MPP / distributed computing, and the entire reason why reduce even exists.

    and

    Reduce is defined formally as part of the MapReduce paradigm,

    I am not sure how this two statements combine. Can anyone put some light on that?

  • I tested different collections and I haven't seen performance difference between reduce and foldLeft. It looks like ParSeq is a special case, is that right?

  • Do we really need order to define fold?

    we cannot define fold because chunks do not have an ordering and fold only requires associativity, not commutativity.

    Why it couldn't be generalized to unordered collection?

1

There are 1 answers

1
Tomas Petricek On BEST ANSWER

As mentioned in the comments, the term reduce means different thing when used in the context of MapReduce and when used in the context of functional programming.

  • In MapReduce, the system groups the results of the map function by a given key and then calls the reduce operation to aggregate values for each group (so reduce is called once for each group). You can see it as a function (K, [V]) -> R taking the group key K together with all the values belonging to the group [V] and producing some result.

  • In functional programming, reduce is a function that aggregates elements of some collection when you give it an operation that can combine two elements. In other words, you define a function (V, V) -> V and the reduce function uses it to aggregate a collection [V] into a single value V.

When you want to add numbers [1,2,3,4] using + as the function, the reduce function can do it in a number of ways:

  1. It can run from the start and calculate ((1+2)+3)+4)
  2. It can also calculate a = 1+2 and b = 3+4 in parallel and then add a+b!

The foldLeft operation is, by definition always proceeding from the left and so it always uses the evaluation strategy of (1). In fact, it also takes an initial value, so it evaluates something more like (((0+1)+2)+3)+4). This makes foldLeft useful for operations where the order matters, but it also means that it cannot be implemented for unordered collections (because you do not know what "left" is).