Subset-Sum in Linear Time

4.2k views Asked by At

This was a question on our Algorithms final exam. It's verbatim because the prof let us take a copy of the exam home.

  1. (20 points) Let I = {r1,r2,...,rn} be a set of n arbitrary positive integers and the values in I are distinct. I is not given in any sorted order. Suppose we want to find a subset I' of I such that the total sum of all elements in I' is exactly 100*ceil(n^.5) (each element of I can appear at most once in I'). Present an O(n) time algorithm for solving this problem.

As far as I can tell, it's basically a special case of the knapsack problem, otherwise known as the subset-sum problem ... both of which are in NP and in theory impossible to solve in linear time?

So ... was this a trick question?


This SO post basically explains that a pseudo-polynomial (linear) time approximation can be done if the weights are bounded, but in the exam problem the weights aren't bounded and either way given the overall difficulty of the exam I'd be shocked if the prof expected us to know/come up with an obscure dynamic optimization algorithm.

3

There are 3 answers

0
Craig Gidney On BEST ANSWER

There are two things that make this problem possible:

  1. The input can be truncated to size O(sqrt(n)). There are no negative inputs, so you can discard any numbers greater than 100*sqrt(n), and all inputs are distinct so we know there are at most 100*sqrt(n) inputs that matter.
  2. The playing field has size O(sqrt(n)). Although there are O(2^sqrt(n)) ways to combine the O(sqrt(n)) inputs that matter, you don't have to care about combinations that either leave the 100*sqrt(n) range or redundantly hit a target you can already reach.

Basically, this problem screams dynamic programming with each input being checked against each part of the 'reached number' space somehow.

The solution ends up being a matter of ensuring numbers don't reach off of themselves (by scanning in the right direction), of only looking at each number once, and of giving ourselves enough information to reconstruct the solution afterwards.

Here's some C# code that should solve the problem in the given time:

int[] FindSubsetToImpliedTarget(int[] inputs) {
    var target = 100*(int)Math.Ceiling(Math.Sqrt(inputs.Count));

    // build up how-X-was-reached table
    var reached = new int?[target+1];
    reached[0] = 0; // the empty set reaches 0
    foreach (var e in inputs) {
        // we go backwards to avoid reaching off of ourselves
        for (var i = target; i >= e; i--) {
            if (reached[i-e].HasValue) {
                reached[i] = e;
            }
        }
    }

    // was target even reached?
    if (!reached[target].HasValue) return null;

    // build result by back-tracking via the logged reached values
    var result = new List<int>();
    for (var i = target; reached[i] != 0; i -= reached[i].Value) {
        result.Add(reached[i].Value);
    }
    return result.ToArray();
}

I haven't actually tested the above code, so beware typos and off-by-ones.

0
Abhishek Bansal On

Ok following is a simple solution in O(n) time.

Since the required sum S is of the order of O(n^0.5), if we formulate an algorithm of complexity S^2, then we are good since our algorithm shall be of effective complexity O(n).

  1. Iterate once over all the elements and check if the value is less than S or not. If it is then push it in a new array. This array shall contain a maximum of S elements (O(n^.5))

  2. Sort this array in descending order in O(sqrt(n)*logn) time ( < O(n)). This is so because logn <= sqrt(n) for all natural numbers. https://math.stackexchange.com/questions/65793/how-to-prove-log-n-leq-sqrt-n-over-natural-numbers

Now this problem is a 1D knapsack problem with W = S and number of elements = S (upper bound).

Maximize the total weight of items and see if it equals S.

It can be solved using dynamic programming in linear time (linear wrt W ~ S).

0
notbad On

With the typical DP algorithm for subset-sum problem will obtain O(N) time consuming algorithm. We use dp[i][k] (boolean) to indicate whether the first i items have a subset with sum k,the transition equation is:

dp[i][k] = (dp[i-1][k-v[i] || dp[i-1][k]),

it is O(NM) where N is the size of the set and M is the targeted sum. Since the elements are distinct and the sum must equal to 100*ceil(n^.5), we just need consider at most the first 100*ceil(n^.5) items, then we get N<=100*ceil(n^.5) and M = 100*ceil(n^.5).

The DP algorithm is O(N*M) = O(100*ceil(n^.5) * 100*ceil(n^.5)) = O(n).