Find and extract content of division of certain class using DomXPath

83 views Asked by At

I am trying to extract and save into PHP string (or array) the content of a certain section of a remote page. That particular section looks like:

<section class="intro">
        <div class="container">
            <h1>Student Club</h1>
            <h2>Subtitle</h2>
            <p>Lore ipsum paragraph.</p>
        </div>
</section>

And since I can't narrow down using class container because there are several other sections of class "container" on the same page and because there is the only section of class "intro", I use the following code to find the right division:

$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTMLFile("https://www.remotesite.tld/remotepage.html");
$finder = new DomXPath($doc);
$intro = $finder->query("//*[contains(@class, 'intro')]");

And at this point, I'm hitting a problem - can't extract the content of $intro as PHP string.

Trying further the following code

foreach ($intro as $item) {
                    $string = $item->nodeValue;
                    echo $string;
                }

gives only the text value, all the tags are stripped and I really need all those divs, h1 and h2 and p tags preserved for further manipulation needs.

Trying:

foreach ($intro->attributes as $attr) {
                $name = $attr->nodeName;
                $value = $attr->nodeValue;
                echo $name;
                echo $value;
            }

is giving the error:

Notice: Undefined property: DOMNodeList::$attributes in 

So how could I extract the full HTML code of the found DOM elements?

1

There are 1 answers

0
Nick On BEST ANSWER

I knew I was so close... I just needed to do:

            foreach ($intro as $item) {
                $h1= $item->getElementsByTagName('h1');
                $h2= $item->getElementsByTagName('h2');
                $p= $item->getElementsByTagName('p');
            }