Efficient sums of n-length combinations of array

Question

Efficient sums of n-length combinations of array

71 views Asked by Lazy Titanic At 31 March 2023 at 21:11

Given the following input:

An integer n, e.g., 36.
An list/array mylist of length m, e.g., [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]. Although the values in this example are evenly spaced and integer-valued floats, they are not necessarily evenly spaced and they are in general non-negative real numbers.

I am trying to do the following computational steps:

Generate a list of n-length combinations (with replacement) of mylist, e.g. for n == 3:
```
[0.0, 0.0, 0.0]
[0.0, 0.0, 24.0]
...
[120.0, 120.0, 96.0]
[120.0, 120.0, 120.0]
```
Sum the elements of each combination in the above list, e.g. for n == 3:
```
0
24.0
...
336.0
360.0
```
Removing duplicates from the above list of sums, e.g., reducing the list length from 56 to 16 for n == 3 or from 749,398 to 181 for n == 36.

I have implemented this in Python in two ways: using lists and using pandas.DataFrames. For values of n as high as 36+, the above steps take too long for my application (1+ seconds) due to the exponential nature of combinations. Although performing steps 2 and 3 on a DataFrame brings speed improvements, creating the DataFrame from the list of combinations makes the overall process slower again.

Implementation using lists:

from itertools import combinations_with_replacement

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations = list(combinations_with_replacement(mylist, n))  # Step 1.
combination_sums = list(map(sum, combinations))                # Step 2.
unique_combination_sums = list(set(combination_sums))          # Step 3.

Implementation using DataFrames:

from itertools import combinations_with_replacement

import pandas as pd

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations_df = pd.DataFrame(list(combinations_with_replacement(mylist, n)))  # Step 1.
combination_sums_df = combinations_df.sum(axis=1)                               # Step 2.
unique_combination_sums_df = combination_sums_df.drop_duplicates()              # Step 3.

Note: Using numpy.ndarrays is computationally faster than using DataFrames but slower than using lists.

Is there a more efficient algorithm, library, or other technique to make the above process faster? Perhaps something that takes advantage of the tree-like nature of combinations?

Original Q&A

There are 2 answers

dankal444 On 01 April 2023 at 11:39

For any arbitrary input list recursive approach is more efficient. It avoids permutations that will give the same result (like [0, 1, 2], [2, 1, 0])

def dankal(input_list, n):
    unique_sums = set()
    for i in range(n + 1):
        if len(input_list) == 1:
            unique_sums.add(input_list[0] * n)
        else:
            current_item_sum = i * input_list[0]
            downstream_sums = dankal(input_list[1:], n - i)
            for item in downstream_sums:
                unique_sums.add(item + current_item_sum)
    return unique_sums

Compared to slothrop version:

it is much slower for equally spaced input
it is much faster for random input

Probably, using lists (or even better, numpy arrays) instead of sets would make this algorithm faster

**slothrop** · Accepted Answer · 2023-03-31 21:43:31

Assuming that myList can be anything (i.e. we can't rely on it having equally spaced values like in the example) you could try something like this. The idea is to build up the sums of length-1 combinations, then all the sums of length-2 combinations, etc, uniquifying at each step rather than only at the end.

mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]
n = 36

sums = {0}
for i in range(n):
  # All the sums from a combination of (i+1) terms:
  sums = {s + p for s in sums for p in mylist}

print(len(sums))

Generates 181 sums for your example, as expected.

I haven't tested speed in much detail, but from a quick run on replit, using your mylist, n can go to about 550 without the runtime exceeding 1 second. That input is probably a bit flattering though, since the number of distinct sums is smaller for a list of equally spaced values than for an arbitrary input list.

TechQA.

Efficient sums of n-length combinations of array

There are 2 answers

Related Questions in PYTHON

Related Questions in PERFORMANCE

Related Questions in COMBINATIONS

Popular Questions

Popular Tags

Trending Questions