In the book "JavaScript: The Good Parts", it explains method string.match(regexp)
as below:
The match method matches a string and a regular expression. How it does this depends on the g flag. If there is no g flag, then the result of calling string .match( regexp ) is the same as calling regexp .exec( string ). However, if the regexp has the g flag, then it produces an array of all the matches but excludes the capturing groups:
Then the book provides code example:
var text = '<html><body bgcolor=linen><p>This is <b>bold<\/b>!<\/p><\/body><\/html>';
var tags = /[^<>]+|<(\/?)([A-Za-z]+)([^<>]*)>/g;
var a, i;
a = text.match(tags);
for (i = 0; i < a.length; i += 1) {
document.writeln(('// [' + i + '] ' + a[i]).entityify());
}
// The result is
// [0] <html>
// [1] <body bgcolor=linen>
// [2] <p>
// [3] This is
// [4] <b>
// [5] bold
// [6] </b>
// [7] !
// [8] </p>
// [9] </body>
// [10] </html>
My question is that I can't understand "but excludes the capturing groups".
In the code example above, html
in the </html>
is in a capturing group. And why is it still included in the result array?
And /
in the </html>
is also in a capturing group. And why is it included in the result array?
Could you explain "but excludes the capturing groups" with the code example above?
Thank you very much!
Because it's the full match. When he says "but excludes the capture groups" he doesn't mean from the full match result, just that the contents of the capture groups aren't reiterated in the array. If the capturing groups were included, you'd see
For the same reason as above: It's part of the overall match, and that's what's in the result; the contents of the individual capture groups are not.
This is easier to understand with a simpler example. Consider this code:
Because the regular expression has the
g
flag, only the full matches are included in the array, so we see:In each case, the entry in the array is the full match, which includes the characters that matched within capture groups making up the overall expression.
If we removed the
g
flag but didn't change anything else, we'd get the first full match followed by the contents of the two capture groups:There, the first entry is the full match; then the second and third are the contents of the capture groups. Note that the contents of the capture gruops