Java regexp - adding parenthesis makes it greedy?

1.3k views Asked by At
public void name() throws Exception {
        Pattern p = Pattern.compile("\\d{1,2}?");
        String input = "09";
        Matcher m = p.matcher(input);
        StringBuffer sb = new StringBuffer();
        while(m.find()) {
            System.out.println("match = "+m.group());
        }
    }

Output of the above method is :

match = 0
match = 9

Now, I am just adding parenthesis to the regexp:

public void name() throws Exception {
        Pattern p = Pattern.compile("(\\d{1,2})?");
        String input = "09";
        Matcher m = p.matcher(input);
        StringBuffer sb = new StringBuffer();
        while(m.find()) {
            System.out.println("match = "+m.group());
        }
    }

And the output becomes:

match = 09
match = 
  1. Why do parenthesis make the matching greedy here?
  2. [Edit,added later]Why is empty string not matched in the first case?
2

There are 2 answers

7
M A On BEST ANSWER

In order for a quantifier (?, *, + or {n,m}) to be reluctant (non-greedy or lazy), it must by followed by a ?. Therefore, the pattern \\d{1,2}? is reluctant.

On the other hand, (\\d{1,2})? is composed of two levels of greedy quantifiers:

  1. a group (\\d{1,2}) containing a pattern with a greedy quantifier {1,2},
  2. followed by the ? greedy quantifier.

Therefore, (\\d{1,2})? is greedy because the ? does not immediately follow a quantifier (there is a parenthesis in between that closes the group), hence it does not act as the metacharacter for a reluctant regex.

See this page about quantifiers as a reference.

0
anubhava On

Let's start with simple quantifier regex:

\d{1,2}

This is greedy by nature and will match as many characters as possible between min (i.e. 1) and max (i.e 2).

So for our input 09, it will just match 09.

Now let's make it lazy by using:

\d{1,2}?

This will make it as few times as possible. So for the same input it will match 0 first time and 9 second time (since you're using a while loop).

Now 3rd case of:

(\d{1,2})?

It matches 0 or 1 occurrence of greedy \d{1,2}, that means means match \d{1,2} or match nothing.

So for the same input it will match:

  1. 09 or
  2. empty string

It is because we are making whole group optional by placing ? outside (...)