How to get all possible interpretations in regex match?

90 views Asked by At

If "Who acted as (?P<role>.*) in (?P<movie>.*)" is the template I want to match for queries like "Who acted as tony montana in Scarface".

If the role name has a "in" here or If the movie name has an "in", the regex match will go wrong.

Eg: "Who acted as k in men in black" will give "k in men" as role.

May be a non greedy approach will work for this query but it will go for a toss if the movie contains the word "in". How do I get all possible interpretations here?

1

There are 1 answers

0
georg On BEST ANSWER

Given a phrase like 'a in b in c in d' this will generate all possible partitions by the word in:

words = phrase.split()

for n, w in enumerate(words):
    if w == 'in':
        print '(%s) in (%s) ' % (
            ' '.join(words[:n]),
            ' '.join(words[n+1:]))

For your specific problem, if there are three ins in the phrase, the "middle" interpretation ((a in b) in (c in d)) would be most probably correct, but with two ins there's no way to solve this by the means of text manipulations, because "left" and "right" partitions are equally probable, consider:

Who acted as jeebs in men in black
Who acted as woman in red in matrix

You'll have to use NLP or database-driven methods to parse this correctly.