Linked Questions

Popular Questions

Efficient sums of n-length combinations of array

Asked by At

Given the following input:

  • An integer n, e.g., 36.
  • An list/array mylist of length m, e.g., [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]. Although the values in this example are evenly spaced and integer-valued floats, they are not necessarily evenly spaced and they are in general non-negative real numbers.

I am trying to do the following computational steps:

  1. Generate a list of n-length combinations (with replacement) of mylist, e.g. for n == 3:
    [0.0, 0.0, 0.0]
    [0.0, 0.0, 24.0]
    ...
    [120.0, 120.0, 96.0]
    [120.0, 120.0, 120.0]
    
  2. Sum the elements of each combination in the above list, e.g. for n == 3:
    0
    24.0
    ...
    336.0
    360.0
    
  3. Removing duplicates from the above list of sums, e.g., reducing the list length from 56 to 16 for n == 3 or from 749,398 to 181 for n == 36.

I have implemented this in Python in two ways: using lists and using pandas.DataFrames. For values of n as high as 36+, the above steps take too long for my application (1+ seconds) due to the exponential nature of combinations. Although performing steps 2 and 3 on a DataFrame brings speed improvements, creating the DataFrame from the list of combinations makes the overall process slower again.

Implementation using lists:

from itertools import combinations_with_replacement

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations = list(combinations_with_replacement(mylist, n))  # Step 1.
combination_sums = list(map(sum, combinations))                # Step 2.
unique_combination_sums = list(set(combination_sums))          # Step 3.

Implementation using DataFrames:

from itertools import combinations_with_replacement

import pandas as pd

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations_df = pd.DataFrame(list(combinations_with_replacement(mylist, n)))  # Step 1.
combination_sums_df = combinations_df.sum(axis=1)                               # Step 2.
unique_combination_sums_df = combination_sums_df.drop_duplicates()              # Step 3.

Note: Using numpy.ndarrays is computationally faster than using DataFrames but slower than using lists.

Is there a more efficient algorithm, library, or other technique to make the above process faster? Perhaps something that takes advantage of the tree-like nature of combinations?

Related Questions