Find and extract content of division of certain class using DomXPath

Question

Find and extract content of division of certain class using DomXPath

75 views Asked by Nick At 20 December 2016 at 23:42

I am trying to extract and save into PHP string (or array) the content of a certain section of a remote page. That particular section looks like:

<section class="intro">
        <div class="container">
            <h1>Student Club</h1>
            <h2>Subtitle</h2>
            <p>Lore ipsum paragraph.</p>
        </div>
</section>

And since I can't narrow down using class container because there are several other sections of class "container" on the same page and because there is the only section of class "intro", I use the following code to find the right division:

$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTMLFile("https://www.remotesite.tld/remotepage.html");
$finder = new DomXPath($doc);
$intro = $finder->query("//*[contains(@class, 'intro')]");

And at this point, I'm hitting a problem - can't extract the content of $intro as PHP string.

Trying further the following code

foreach ($intro as $item) {
                    $string = $item->nodeValue;
                    echo $string;
                }

gives only the text value, all the tags are stripped and I really need all those divs, h1 and h2 and p tags preserved for further manipulation needs.

Trying:

foreach ($intro->attributes as $attr) {
                $name = $attr->nodeName;
                $value = $attr->nodeValue;
                echo $name;
                echo $value;
            }

is giving the error:

Notice: Undefined property: DOMNodeList::$attributes in

So how could I extract the full HTML code of the found DOM elements?

Original Q&A

There are 1 answers

**Nick** · Accepted Answer · 2016-12-21T00:02:11+00:00

Nick On 21 December 2016 at 00:02 BEST ANSWER

I knew I was so close... I just needed to do:

            foreach ($intro as $item) {
                $h1= $item->getElementsByTagName('h1');
                $h2= $item->getElementsByTagName('h2');
                $p= $item->getElementsByTagName('p');
            }

TechQA.

Find and extract content of division of certain class using DomXPath

There are 1 answers

Related Questions in DOM

Related Questions in DOMXPATH

Popular Questions

Popular Tags

Trending Questions