Smallest sum of difference between elements in two lists

3.7k views Asked by At

Say we have two lists of the same length, ls1 and ls2. For example, we have

ls1 = [4, 6]
ls2 = [3, 5]

and for each element in ls1 we have to pair it with one element with one element in ls2, in a manner such that the total (absolute) difference between the element in ls1 and the element in ls2 is minimal. One element can only be matched once. In the example above, the optimal way is to match 4 is ls1 with 3 in ls2, and 5 in ls1 with 6 in ls2, which yields a total difference of

(4 - 3) + (6 - 5) = 2 

I need a program to return this minimum total difference between the elements in these two lists. The length of the lists is arbitrary, and so is the values of the elements in the lists (but they are all positive integers).

I currently know that using permutation to brute force the solution is an option, but what I need is a code that has optimal time and space complexity. I heard of the concept of dynamic programming, but I have no idea of how to implement it in my case. Thanks in advance for replies.

Ps. Here's my current brute force code using permutation, which is not efficient in runtime or memory usage:

from itertools import permutations

def minimum_total_difference(ls1, ls2):
    length = len(ls1)
    possibilities = list(permuations(ls1, length))
    out = 10**10
    for possibility in possibilities:
        out_ = 0
        for _ in range(length):
            diff = abs(possibility[_] - ls2[_])
            out += diff
        if out_ < out:
            out = out_
    return out
2

There are 2 answers

0
kraskevich On BEST ANSWER

One can prove that the optimal solution is to sort both lists and match their elements in sorted order.

A sketch of the proof:

  1. Let there be an inversion, that is a is matched to b, c is matched to d and a < c, b > d.

  2. We can "exchange" these elements: a->d, c->b. Now a < c, d < b. One can show that this operation never increases the answer (by considering all possible relative values of a, b, c, d)

  3. Thus, there is always an optimal matching where both lists are sorted.

Here's an efficient one-liner that implements this solution:

sum(abs(x - y) for x, y in zip(sorted(xs), sorted(ys)))
0
Jingjie Yang On

As @kraskevich has pointed out, the correct answer is indeed:

sum(abs(x - y) for x, y in zip(sorted(xs), sorted(ys))

I have come up with my own proof:
Consider two lists xs and ys consisting of elements, in random order, x1, x2, ... xn and y1, y2, ... yn.
Since we're trying to find out the minimal sum of absolute difference, we could use square root instead of absolute values, which has little impact to find the minimum value.
Therefore the sum of the differences is:

(x1 - y1)^2 + (x2 - y2)^2 + ... + (xn - yn)^2 
=x1^2 - 2x1 * y1 + y1^2 + ... + xn^2 - 2xn * yn + yn^2

As we can see, in whatever manner we arrange the two lists, the quadratic terms xn^2 and yn^2 stay the same. So, to obtain a minimal result, we simply have to maximize the negative terms -2xn * yn.

In order to do so, we simply multiply the largest value in one list by the largest value in the other list, and then do the same for the second largest value in the two lists, etc. (see Given 2 arrays, find the minimal sum of multiplication of indexes). Hence, by sorting both lists and multiplying elements of the same index in the sorted lists, we obtain the minimal sum of differences.