Python Regex either or case

307 views Asked by At

I have a small module that gets the lemma of a word and its plural form. It then searches through sentences looking for a sentence that contains both words (singular or plural) in either order. I have it working but i was wondering if there is a more elegant way to build this expression. Thanks! Note: Python2

words = ((cell,), (wolf,wolves))
string1 = "(?:"+"|".join(words[0])+")"
string2 = "(?:"+"|".join(words[1])+")"
pat = ".+".join((string1, string2)) +"|"+ ".+".join((string2, string1))
# Pat output: "(?:cell).+(?:wolf|wolves)|(?:wolf|wolves).+(?:cell)"

Then the search:

pat = re.compile(pat)
for sentence in sentences:
    if len(pat.findall(sentence)) != 0:
        print sentence+'\n'
2

There are 2 answers

4
behzad.nouri On

something like:

[ x for x in sentences if re.search( '\bcell\b', x ) and
        ( re.search( '\bwolf\b', x ) or re.search( '\bwolves\b', x ) )]
0
roippi On

The problem is that when you start adding multiple compound look-around expressions, your algorithmic complexity gets out of control. This is going to be a fundamental problem with using regex to solve this problem.

An alternate approach is to try making one O(n) pass per sentence with a Counter and then querying against that:

#helper function
def count_lemma(counter,*args):
    return sum(counter[word] for word in args)

from collections import Counter
from string import punctuation

for sentence in sentences:
    c = Counter(x.rstrip(punctuation).lower() for x in sentence.split())
    if all(count_lemma(c,*word) for word in words):
        print sentence