O(n) list subtraction

191 views Asked by At

When working on an AoC puzzle, I found I wanted to subtract lists (preserving ordering):

def bag_sub(list_big, sublist):
    result = list_big[:]
    for n in sublist:
        result.remove(n)
    return result

I didn't like the way the list.remove call (which is itself O(n)) is contained within the loop, that seems needlessly inefficient. So I tried to rewrite it to avoid that:

def bag_sub(list_big, sublist):
    c = Counter(sublist)
    result = []
    for k in list_big:
        if k in c:
            c -= Counter({k: 1})
        else:
            result.append(k)
    return result
  1. Is this now O(n), or does the Counter.__isub__ usage still screw things up?

  2. This approach requires that elements must be hashable, a restriction which the original didn't have. Is there an O(n) solution which avoids creating this additional restriction? Does Python have any better "bag" datatype than collections.Counter?

You can assume sublist is half the length of list_big.

3

There are 3 answers

0
user2357112 On BEST ANSWER

Is this now O(n), or does the Counter.__isub__ usage still screw things up?

This would be expected-case O(n), except that when Counter.__isub__ discards nonpositive values, it goes through every key to do so. You're better off just subtracting 1 from the key value the "usual" way and checking c[k] instead of k in c. (c[k] is 0 for k not in c, so you don't need an in check.)

if c[k]:
    c[k] -= 1
else:
    result.append(k)

Is there an O(n) solution which avoids creating this additional restriction?

Only if the inputs are sorted, in which case a standard variant of a mergesort merge can do it.

Does Python have any better "bag" datatype than collections.Counter?

collections.Counter is Python's bag.

1
cadolphs On
  1. Removing an item from a list of length N is O(N) if the list is unordered, because you have to find it.
  2. Removing k items from a list of length N, therefore, is O(kN) if we focus on "reasonable" cases where k << N.

So I don't see how you could get it down to O(N).

A concise way to write this:

new_list = [x for x in list_big if x not in sublist]

But that's still O(kN).

9
mgilson On

I'd use a Counter, but I'd probably do it slightly differently, and I'd probably do this iteratively...

def bag_sub(big_list, sublist):
    sublist_counts = Counter(sublist)
    result = []
    for item in big_list:
        if sublist_counts[item] > 0:
            sublist_counts[item] -= 1
        else:
            result.append(item)
    return result

This is very similar to your solution, but it's probably not efficient to create an entire new counter every time you want to decrement the count on something.1

Also, if you don't need to return a list, then consider a generator function...

This works as long as all of the elements in list_big and sublist can be hashed. This solution is O(N + M) where N and M are the lengths of list_big and sublist respectively.

If the elements cannot be hashed, you are out of luck unless you have other constraints (e.g. the inputs are sorted using the same criterion). If your inputs are sorted, you could do something similar to the merge stage of merge-sort to determine which elements from bag_sub are in sublist.

1Note that Counters also behave a lot like a defaultdict(int) so it's perfectly fine to look for an item in a counter that isn't there already.