When working on an AoC puzzle, I found I wanted to subtract lists (preserving ordering):
def bag_sub(list_big, sublist):
result = list_big[:]
for n in sublist:
result.remove(n)
return result
I didn't like the way the list.remove
call (which is itself O(n)) is contained within the loop, that seems needlessly inefficient. So I tried to rewrite it to avoid that:
def bag_sub(list_big, sublist):
c = Counter(sublist)
result = []
for k in list_big:
if k in c:
c -= Counter({k: 1})
else:
result.append(k)
return result
Is this now O(n), or does the
Counter.__isub__
usage still screw things up?This approach requires that elements must be hashable, a restriction which the original didn't have. Is there an O(n) solution which avoids creating this additional restriction? Does Python have any better "bag" datatype than
collections.Counter
?
You can assume sublist
is half the length of list_big
.
This would be expected-case O(n), except that when
Counter.__isub__
discards nonpositive values, it goes through every key to do so. You're better off just subtracting 1 from the key value the "usual" way and checkingc[k]
instead ofk in c
. (c[k]
is 0 fork not in c
, so you don't need anin
check.)Only if the inputs are sorted, in which case a standard variant of a mergesort merge can do it.
collections.Counter
is Python's bag.