How exactly does the possessive quantifier work?

182 views Asked by At

At the end of the page there is at attempted explanation of how do greedy, reluctant and possessive quantifiers work: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

However I tried myself an example and I don't seem to understand it fully.

I will paste my results directly:

Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.

Enter your regex: (.*)+foo
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Why does the first reg.exp. find no match and the second does? What is the exact difference between those 2 reg.exp.?

2

There are 2 answers

4
Tim Pietzcker On BEST ANSWER

The + after another quantifier means "don't allow the regex engine to backtrack into whatever the previous token has matched". (See a tutorial on possessive quantifiers here).

So when you apply .*foo to "xfooxxxxxxfoo", the .* first matches the entire string. Then, since foo can't be matched, the regex engine backtracks until that's possible, achieving a match when .* has matched "xfooxxxxxx" and foo has matched "foo".

Now the additional + prevents that backtracking from happening, so the match fails.

When you write (.*)+foo. the + takes on an entirely different meaning; now it means "one or more of the preceding token". You've created nested quantifiers, which is not a good idea, by the way. If you apply that regex to a string like "xfoxxxxxxxxxfox", you'll run into catastrophic backtracking.

0
TwoThe On

The possessive quantifier takes the entire string and checks if it matches, if not it fails. In your case xfooxxxxxxfoo matches the .*+ but then you ask for another foo, which isn't present, so the matcher fails.

The greedy quantifier first does the same, but instead of failing it "backs off" and tries again:

xfooxxxxxxfoo fail
xfooxxxxxxfo fail
xfooxxxxxxf fail
xfooxxxxxx match

In your second regex you ask for something else by confusing the grouping mechanism. You ask for "one or more matches of (.*)" as the + now relates to the () and there is one match.