regex in preg_match_all() doesn't work as expected

147 views Asked by At

I have the following string:

"<h2>Define Vim is the greatest</h2> word processor, good <h3>Vi</h3>!".

I want to select h2 and h3 like the following structure with a regex.

Expected output would be:

array(
    0   =>  <h2>Define Vim is the greatviest</h2>
    1   =>  <h3>Vi</h3>
)

So I implement my regular expression as follow:

preg_match_all("/(?:<h2>|<h3>).*vi.*(?:<\/h2>|<\/h3>)/i", $input, $matches)

But instead of the desirable result as above, it outputs the following result.

Current output:

array(
    0 => <h2>Define Vim is the greatviest</h2> word prviocessor ever created <h3>Vi</h3>
)

How can I change my code/regex, so I get the tags as in the expected output above?

1

There are 1 answers

3
Rizier123 On BEST ANSWER

Well your problem is, that you first missing the delimiters for your regex and second vi is case-sensitive, so you would have to add the i flag, for case-insensitivity.

So your code could look something like this (Just removed the vi in the regex and now I just grab everything between h1-6 tags):

<?php

    $input = '"<h2>Define Vim is the greatest</h2> word processor, good <h3>Vi</h3>!".';

    preg_match_all("/(?:<h[0-6]>).*?(?:<\/h[0-6]>)/", $input, $matches);
    print_r($matches);

?>

output:

Array
(
    [0] => Array
        (
            [0] => <h2>Define Vim is the greatest</h2>
            [1] => <h3>Vi</h3>
        )

)

EDIT:

As from your updated regex now your problem is, that .* is greedy, means it takes as much as it can. To make it non-greedy you have to add a ? at the end. So just change your .* -> .*?.