Subset sum variation: Get as many subset sums as possible

920 views Asked by At

For a defragmentation algorithm I need to solve the following problem: given a collection of positive integers, extract as many subsets as possible that sum to a given value. Each item of the collection must only appear in one subset.

A greedy algorithm (iterative normal subset sum) does not work, counter example:

collection: 3 5 8 4 5 1 1 1 1 1, targeted sum: 10
1. subset: 5 1 1 1 1 1
there is no other subset

but:
1. subset: 8 1 1
2. subset: 5 4 1
3. subset: 3 5 1 1

As you can see, if I pick previous subsets poorly, the solution is not optimal. How do I solve this? I have implemented "normal" subset sum already.

Edit: Could the solution possibly be to use small subsets first? Also, this question is no duplicate to the question on how to find the number of possible subsets. I want to find as many disjoint subsets as possible.

Thanks, Phil

4

There are 4 answers

5
David Eisenstat On

This problem is probably strongly NP-hard*, so let me sketch an O(2^n * n)-time algorithm.

Let the input be a multiset U and let the target sum be s. Imagine a graph on 2^n vertices corresponding to subsets of U. There is an arc from a subset X to a subset Y if and only if (1) Y = X - {x} for some x in X, and (2) (sum(U - X) mod s) + x ≤ s, i.e., we don't overflow the current partition. Traverse this graph starting from U and report the path to the subset with the smallest sum.

*My reduction showing hardness is from a problem called 3-dimensional matching. Given that the matching instance has n vertices, find n numbers such that all triples have distinct sums. Applying the probabilistic method, if we choose all n uniformly at random from 1..n^6, that succeeds with probability greater than 11/12 (one minus (n choose 3) choose 2 triples times < 1/n^6 failure probability for each), and verification requires time O(n^3 log n). Technically we have to derandomize; see my question on math.SE.

To prepare the input to this problem, for each vertex associated with a number x, output n^12 + x. For each triple x, y, z that can be matched, output n^9 - x - y - z as a number. The target is s = 3 n^12 + n^9. Some tedious math should show that the only subsets with the target sum correspond to 3-dimensional matched triples.

0
Saeed Amiri On

For the complexity of the problem, it is NPC in strong sense. This is a follow up from the 3-Partition problem:

Given a multiset S of n = 3 m positive integers, can S be partitioned into m triplets S1, S2, …, Sm such that the sum of the numbers in each subset is equal? The subsets S1, S2, …, Sm must form a partition of S in the sense that they are disjoint and they cover S.

We need one additional information from three partition problem (this is a straightforward result from its hardness proof):

Let B denote the (desired) sum of each subset Si, or equivalently, let the total sum of the numbers in S be m B. The 3-partition problem remains NP-complete when every integer in S is strictly between B/4 and B/2.

For the case of your problem, we can suppose input is consisting of 3m elements, the given value is B and for each element a∈S we have B/4 < a < B/2. Then we can find m sets such that each of them has total sum B if and only if we can solve the three partition. (Note that the range constrain automatically forces having 3 element in each set).

0
Scott Hunter On

If you allow for backtracking (that is, the ability to go back and undo the previous decision), then you can search the whole space of possibilities and be sure to find your solution. Not the most efficient, but it works.

Or you could build on @Druid's comment: find all of the possible subsets, and search through those to find your partition, allowing for subsets to be duplicated. You could add the heuristic that you want to try smaller subsets before larger ones (leaving more flexibility for future choices).

0
Clinton Ray Mulligan On

I created a recursive procedure that finds all subset sums. It proactively decides whether a number in a set is Too Big or Too Small to be a parent node. I wrote out a program in Nodejs that works well. Just revisiting it for optimization and refactoring the code.

For a set that is all positive numbers there are many heuristic and optimizations that could be made to my code.

You can read about the math I do by hand here: https://medium.com/@clint.mulligan/innovative-solution-to-the-subset-sum-problem-10a19f87056b

And a link to the code here: https://github.com/ClintMulligan/subset-sum-finder.

Eseentially The set is sorted. The first number is considered for inclusion as a parent node in a general tree. The relevant data bieing The set of integers the number belongs to, ie positive or negatives.

Too Big to be a parent node is |n| > |target Sum - (opposite group)| and being too small is (same group) < target Sum

If either of those is true the next number is considered. If they are both false the Number is moved as a node, and a new sub problem of the target Sum minus the Number and the remaining set is created. This is done till every sub problem ends in a solution or a dead end.

By using some optimizations and further investigation I hope to eliminate the most dead ends. Which are not that bad right.