Non-Greedy regex acts greedy based on the position of atoms in regex

860 views Asked by At

I came across one situation where I wanted to use non-greedy atom .*? in the regex pattern.

set input "Device ID: HOST1
Interface: GigabitEthernet0/1,  Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2,  Port ID (outgoing port): GigabitEthernet2/43
"

puts "======== Non-Greedy regex starting with some other patterns ========"
puts [ regexp -inline {Device\s+ID:.*?outgoing\s+port\):\s+} $input]
puts "======== Non-Greedy regex at first ========"
puts [ regexp -inline {.*?outgoing\s+port\):\s+} $input]

Output :

======== Non-Greedy regex starting with some other patterns ========
{Device ID: HOST1
Interface: GigabitEthernet0/1,  Port ID (outgoing port): GigabitEthernet2/43
Device ID: HOST2
Entry address(es):
Interface: GigabitEthernet0/2,  Port ID (outgoing port): }
======== Non-Greedy regex at first ========
{Device ID: HOST1
Interface: GigabitEthernet0/1,  Port ID (outgoing port): }

While .*?outgoing\s+port\):\s+ is matching till the first occurrence, the pattern Device\s+ID:.*?outgoing\s+port\):\s+ is not stopping at the first occurrence of the match.

Why the behavior of non-greedy match is getting affected due to placement of the atoms?

1

There are 1 answers

2
glenn jackman On BEST ANSWER

It's not that well documented (IMO) but the re_syntax man page says this about greedy/non-greedy preference:

A branch has the same preference as the first quantified atom in it which has a preference.

(emphasis mine)

So if you have .* as the first quantifier, the whole RE will be greedy,
and if you have .*? as the first quantifier, the whole RE will be non-greedy.