Why is the star quantifier greedier than the plus quantifier in Java regular expressions?

Question

Why is the star quantifier greedier than the plus quantifier in Java regular expressions?

215 views Asked by duber At 09 December 2013 at 17:27

I have text I'm trying to extract from LogicalID and SupplyChain from

 <LogicalID>SupplyChain</Logical>

At first I used the following regex:

.*([A-Za-z]+)>([A-Za-z]+)<.*

This matched as follows:

["D", "SupplyChain"]

In a fit of desperation, I tried using the asterisk instead of the plus:

.*([A-Za-z]*)>([A-Za-z]+)<.*

This matched perfectly.

The documentation says * matches zero or more times and + matches one or more times. Why is * greedier than +?

EDIT: It's been pointed out to me that this isn't the case below. The order of operations explains why the first match group is actually null.

Original Q&A

There are 3 answers

anubhava On 09 December 2013 at 17:32

You should really be using this regex:

<([A-Za-z]+)>([A-Za-z]+)<

OR

<([A-Za-z]*)>([A-Za-z]+)<

Both will match LogicalID and SupplyChain respectively.

PS: Your regex: .*([A-Za-z]*)>([A-Za-z]+)< is matching empty string as first match.

Working Demo: http://ideone.com/VMsb6n

Rakesh KR On 09 December 2013 at 17:49

Why is * greedier than +?

It doesnot shows greedness.

The first regex .*([A-Za-z]+)>([A-Za-z]+)<.* can be represented as

enter image description here

Here Group1 should need to present one or more time for a match.

And the Second .*([A-Za-z]*)>([A-Za-z]+)<.* as

enter image description here

Here Group1 should need to present Zero or more time for a match.

**Aioros** · Accepted Answer · 2013-12-09T17:37:39+00:00

It's not a difference in greediness. In your first regex:

.*([A-Za-z]+)>([A-Za-z]+)<.*

You are asking for any amount of characters (.*), then at least a letter, then a >. So the greedy match has to be D, since * consumes everything before D.

In the second one, instead:

.*([A-Za-z]*)>([A-Za-z]+)<.*

You want any amount of characters, followed by any amount of letters, then the >. So the first * consumes everything up to the >, and the first capture group matches an empty string. I don't think that it "matches perfectly" at all.

TechQA.

Why is the star quantifier greedier than the plus quantifier in Java regular expressions?

There are 3 answers

Related Questions in JAVA

Related Questions in REGEX

Related Questions in REGEX-GREEDY

Popular Questions

Popular Tags

Trending Questions