# Efficient sums of n-length combinations of array

Given the following input:

• An integer `n`, e.g., `36`.
• An list/array `mylist` of length `m`, e.g., `[0.0, 24.0, 48.0, 72.0, 96.0, 120.0]`. Although the values in this example are evenly spaced and integer-valued floats, they are not necessarily evenly spaced and they are in general non-negative real numbers.

I am trying to do the following computational steps:

1. Generate a list of `n`-length combinations (with replacement) of `mylist`, e.g. for `n == 3`:
``````[0.0, 0.0, 0.0]
[0.0, 0.0, 24.0]
...
[120.0, 120.0, 96.0]
[120.0, 120.0, 120.0]
``````
2. Sum the elements of each combination in the above list, e.g. for `n == 3`:
``````0
24.0
...
336.0
360.0
``````
3. Removing duplicates from the above list of sums, e.g., reducing the list length from 56 to 16 for `n == 3` or from 749,398 to 181 for `n == 36`.

I have implemented this in Python in two ways: using `list`s and using `pandas.DataFrame`s. For values of `n` as high as 36+, the above steps take too long for my application (1+ seconds) due to the exponential nature of combinations. Although performing steps 2 and 3 on a `DataFrame` brings speed improvements, creating the `DataFrame` from the list of combinations makes the overall process slower again.

Implementation using `list`s:

``````from itertools import combinations_with_replacement

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations = list(combinations_with_replacement(mylist, n))  # Step 1.
combination_sums = list(map(sum, combinations))                # Step 2.
unique_combination_sums = list(set(combination_sums))          # Step 3.
``````

Implementation using `DataFrame`s:

``````from itertools import combinations_with_replacement

import pandas as pd

n = 36
mylist = [0.0, 24.0, 48.0, 72.0, 96.0, 120.0]

combinations_df = pd.DataFrame(list(combinations_with_replacement(mylist, n)))  # Step 1.
combination_sums_df = combinations_df.sum(axis=1)                               # Step 2.
unique_combination_sums_df = combination_sums_df.drop_duplicates()              # Step 3.
``````

Note: Using `numpy.ndarray`s is computationally faster than using `DataFrame`s but slower than using `list`s.

Is there a more efficient algorithm, library, or other technique to make the above process faster? Perhaps something that takes advantage of the tree-like nature of combinations?