For a Set of Integers, Get All Sets Whose Sum is Greater Than Some Value

218 views Asked by At

Let me start by saying I have found many questions and answers that answer things which seem somewhat related or close to this question, but are not actually this question. It seems related to the coin change / subset sum problem, but I think this is distinctly different enough that subset sum answers do not cover this question.

The Problem

Let's say I have a set of 4 integers called S : [1, 2, 5, 1]

The goal is to devise a set of sets, G , that is comprised of each subset of S where the sum of the set is greater than some value N.

In this example, if N = 6, then G would be [ [5,2,1,1], [5,2,1], [5,2], [5,1,1] ]

Considerations

I chose the word set here specifically because the order of each set does not matter at all. [5, 2, 1, 1] is identical to [5, 1, 1, 2].

When you create your first set that exceeds N, lets say G1, you can also immediately add all other sets which contain G1 as a subset. Likewise, if you find another new set G2 that has not already added and also exceeds N, you can immediately add all sets which contain G2 as a subset. You don't need to check if these sets sum to greater than N.

If you sum all the members of S and the result is not greater than N, then you can conclude there are no sets which meet the criteria.

Actual Use-case

The real-world goal that is trying to be met here is that I have a group of items which have a numerical weight associated with them, and another value which represents a threshold. I'm trying find out if its even feasible to routinely find all the sub-groups that can be made of the items that exceed the threshold when their weights are summed.

What is the most efficient way to generate all sets that meet this criteria? What is complexity of this?

2

There are 2 answers

0
Dillon Davis On BEST ANSWER

Here's a more efficient implementation. Worthwhile notes:

  • It correctly handles duplicate elements to avoid returning the same subset twice
  • In fact, it avoids recomputing equivalent subsets altogether, by grouping same-valued elements into bins with counts
  • It starts from the largest subset (the set itself) and gradually removes elements while staying above the threshold, and early-exits to avoid exploring smaller subsets entirely
  • It uses list mutation to avoid expensive list creation / coping until absolutely necessary
  • It does use recursion, so there's a small performance penalty there, and as with all recursion: if you throw a large enough input at it, it will raise an error. Its an exercise to the reader to port this to an iterative variant if that becomes an issue
from collections import Counter

def solve(nums, target):
    counts = sorted(Counter(nums).items())
    reserve = sum(nums) - target
    
    if reserve <= 0:
        return []
    return list(_solve(counts, reserve, []))

def _solve(counts, reserve, prefix):
    if not counts:
        yield tuple(prefix)
        return
    
    val, max_count = last = counts.pop()
    
    prefix.extend([val] * max_count)
    yield from _solve(counts, reserve, prefix)
    
    for count in range(1, max_count + 1):
        prefix.pop()
        if reserve - count * val > 0:
            yield from _solve(counts, reserve - count * val, prefix)
    
    counts.append(last)

Timing it:

from timeit import timeit

runs = 10
statement = "solve([randrange(15) for _ in range(25)], 100)"
setup = "from __main__ import solve; from random import randrange"
print(timeit(statement, setup=setup, number=runs)/runs)

Output:

0.22455865999218078

Meaning its able to solve problems of 25 elements where the solution contains ~100k distinct subsets in about 225ms.

I did test this against a naive solution to ensure correctness as well.

Regarding time complexity, its hard to bound this, because its run time really depends on the value of the numbers (or rather, the output size). You could bound it as O(n log n + s * n) where n is the size of the input list, and s is the size of the output list.

2
Dan Constantinescu On

You can use itertools to generate sub-lists and then filter those lists based on a condition:

import itertools

input_list = [1, 2, 5, 1]
threshold = 6

result = []

for length in range(len(input_list) + 1):

    search_space = list(itertools.combinations(input_list, length))

    for item in search_space:
        if sum(item) > threshold:
            result.append(item)

print(result)

Time complexity is quite high as you first need to generate sub-lists of every length (outer for loop), then search right values based on condition (inner for loop), so it will be at least O(n^2).