If "Who acted as (?P<role>.*) in (?P<movie>.*)"
is the template
I want to match for queries like "Who acted as tony montana in Scarface"
.
If the role name has a "in" here or If the movie name has an "in", the regex match will go wrong.
Eg: "Who acted as k in men in black" will give "k in men" as role.
May be a non greedy approach will work for this query but it will go for a toss if the movie contains the word "in". How do I get all possible interpretations here?
Given a phrase like
'a in b in c in d'
this will generate all possible partitions by the wordin
:For your specific problem, if there are three
in
s in the phrase, the "middle" interpretation ((a in b) in (c in d)
) would be most probably correct, but with twoin
s there's no way to solve this by the means of text manipulations, because "left" and "right" partitions are equally probable, consider:You'll have to use NLP or database-driven methods to parse this correctly.