Given the following input:
- An integer
n
, e.g.,36
. - An list/array
mylist
of lengthm
, e.g.,[0.0, 24.0, 48.0, 72.0, 96.0, 120.0]
. Although the values in this example are evenly spaced and integer-valued floats, they are not necessarily evenly spaced and they are in general non-negative real numbers.
I am trying to do the following computational steps:
- Generate a list of
n
-length combinations (with replacement) ofmylist
, e.g. forn == 3
:[0.0, 0.0, 0.0] [0.0, 0.0, 24.0] ... [120.0, 120.0, 96.0] [120.0, 120.0, 120.0]
- Sum the elements of each combination in the above list, e.g. for
n == 3
:0 24.0 ... 336.0 360.0
- Removing duplicates from the above list of sums, e.g., reducing the list length from 56 to 16 for
n == 3
or from 749,398 to 181 forn == 36
.
I have implemented this in Python in two ways: using list
s and using pandas.DataFrame
s. For values of n
as high as 36+, the above steps take too long for my application (1+ seconds) due to the exponential nature of combinations. Although performing steps 2 and 3 on a DataFrame
brings speed improvements, creating the DataFrame
from the list of combinations makes the overall process slower again.
Implementation using list
s:
from itertools import combinations_with_replacement
n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]
combinations = list(combinations_with_replacement(mylist, n)) # Step 1.
combination_sums = list(map(sum, combinations)) # Step 2.
unique_combination_sums = list(set(combination_sums)) # Step 3.
Implementation using DataFrame
s:
from itertools import combinations_with_replacement
import pandas as pd
n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]
combinations_df = pd.DataFrame(list(combinations_with_replacement(mylist, n))) # Step 1.
combination_sums_df = combinations_df.sum(axis=1) # Step 2.
unique_combination_sums_df = combination_sums_df.drop_duplicates() # Step 3.
Note: Using numpy.ndarray
s is computationally faster than using DataFrame
s but slower than using list
s.
Is there a more efficient algorithm, library, or other technique to make the above process faster? Perhaps something that takes advantage of the tree-like nature of combinations?