+ is supposed to be greedy, so why am I getting a lazy result?

78 views Asked by At

Why does the following regex return 101 instead of 1001?

console.log(new RegExp(/1(0+)1/).exec('101001')[0]);

I thought that + was greedy, so the longer of the two matches should be returned.

IMO this is different from Using javascript regexp to find the first AND longest match because I don't care about the first, just the longest. Can someone correct my definition of greedy? For example, what is the difference between the above snippet and the classic "oops, too greedy" example of new RegExp(/<(.+)>/).exec('<b>a</b>')[0] giving b>a</b?

(Note: This seems to be language-agnostic (it also happens in Perl), but just for ease of running it in-browser I've used JavaScript here.)

2

There are 2 answers

0
Wiktor Stribiżew On BEST ANSWER

Greedy means up to the rightmost occurrence, it never means the longest in the input string.

Regex itself is not the correct tool to extract the longest match. You might get all the substrings that match your pattern, and get the longest one using the language specific means.

Since the string is parsed from left to right, 101 will get matched in 101001 first, and the rest (001) will not match (as the 101 and 1001 matches are overlapping). You might use /(?=(10+1))./g and then check the length of each Group 1 value to get the longest one.

var regex = /(?=(10+1))./g;
var str = "101001";
var m, res=[];

while ((m = regex.exec(str)) !== null) {
  res.push(m[1]);
}
console.log(res); // => ["101", "1001"]

if (res.length>0) {
  console.log("The longest match:", res.sort(function (a, b) { return b.length - a.length; })[0]);
} // => 1001

5
Fallenhero On

Regex always reads from left to right! It will not look for something longer. In the case of multiple matches, you have to re-execute the regex to get them, and compare their lengths yourself.