To check if sublist exists in another list

1.1k views Asked by At
coll = [[3, 3], [2, 2, 2], [2, 4], [2, 3], [2, 2]]
main = [4, 3, 3, 2, 2, 2, 2, 2, 2, 2]

I have 2 lists. 'coll' is a list of lists with each sublist containing integers which might have duplicates(ex- [2, 2, 2]). And main is a list containing integers. I want to check if the sublist elements of 'coll' are present in 'main' or not. For this case, it is true since [2, 2, 2], [3, 3] and other sublists are present. The order of elements in the sublist and 'main' doesn't matter. Whatever elements are present in sublist they may be present in 'main' in any position.

I cannot use sets because of the presence of duplicates. And also I cannot use strings because:

coll = ['222']
main = ['423262']

I have used a sample of sublist to show the problem with using string. My algorithm requirement is that in this case also 'true' is returned because '2' is present at 3 locations , index- 1, 2, 5. But:

if coll in main:
    return True
else:
    return False

this returns false if I use strings for checking.

Please suggest any method.

1

There are 1 answers

2
jsbueno On BEST ANSWER

I think the most readable way to do that is to create a Counter instance for each of your sublists, and them check with the list "count" method if it matches the requirement for each argument of the sublist:

from itertools import Counter

def checksub(main, sublist):
    c = Counter(sublist)
    for num, count in c.items():
        if main.count(num) < count:
            return False
    return True

all(checksub(main, sublist) for sublist in coll)

This is not fast - if you are iterating over a large data volume, you'd better use some approach that map the "main" list into a data-structure where the counting can be checked in a faster way tahn using "count". Or, if there are few distinct numbers, even something as simple as cache the returns of "count" for each different number. Otherwise for small sizes of "main" this might suffice.

On a second reading of your question, it seems like you only require that one of the sublists be present in main - if that is the case, just replace the call to all for any above.