I have text I'm trying to extract from LogicalID
and SupplyChain
from
<LogicalID>SupplyChain</Logical>
At first I used the following regex:
.*([A-Za-z]+)>([A-Za-z]+)<.*
This matched as follows:
["D", "SupplyChain"]
In a fit of desperation, I tried using the asterisk instead of the plus:
.*([A-Za-z]*)>([A-Za-z]+)<.*
This matched perfectly.
The documentation says *
matches zero or more times and +
matches one or more times. Why is *
greedier than +
?
EDIT: It's been pointed out to me that this isn't the case below. The order of operations explains why the first match group is actually null.
It's not a difference in greediness. In your first regex:
You are asking for any amount of characters (
.*
), then at least a letter, then a>
. So the greedy match has to be D, since*
consumes everything before D.In the second one, instead:
You want any amount of characters, followed by any amount of letters, then the
>
. So the first * consumes everything up to the>
, and the first capture group matches an empty string. I don't think that it "matches perfectly" at all.