At the end of the page there is at attempted explanation of how do greedy, reluctant and possessive quantifiers work: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html
However I tried myself an example and I don't seem to understand it fully.
I will paste my results directly:
Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.
Enter your regex: (.*)+foo
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.
Why does the first reg.exp. find no match and the second does? What is the exact difference between those 2 reg.exp.?
The
+
after another quantifier means "don't allow the regex engine to backtrack into whatever the previous token has matched". (See a tutorial on possessive quantifiers here).So when you apply
.*foo
to"xfooxxxxxxfoo"
, the.*
first matches the entire string. Then, sincefoo
can't be matched, the regex engine backtracks until that's possible, achieving a match when.*
has matched"xfooxxxxxx"
andfoo
has matched"foo"
.Now the additional
+
prevents that backtracking from happening, so the match fails.When you write
(.*)+foo
. the+
takes on an entirely different meaning; now it means "one or more of the preceding token". You've created nested quantifiers, which is not a good idea, by the way. If you apply that regex to a string like"xfoxxxxxxxxxfox"
, you'll run into catastrophic backtracking.