I am trying to extract and save into PHP string (or array) the content of a certain section of a remote page. That particular section looks like:
<section class="intro">
<div class="container">
<h1>Student Club</h1>
<h2>Subtitle</h2>
<p>Lore ipsum paragraph.</p>
</div>
</section>
And since I can't narrow down using class container because there are several other sections of class "container" on the same page and because there is the only section of class "intro", I use the following code to find the right division:
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
@$doc->loadHTMLFile("https://www.remotesite.tld/remotepage.html");
$finder = new DomXPath($doc);
$intro = $finder->query("//*[contains(@class, 'intro')]");
And at this point, I'm hitting a problem - can't extract the content of $intro as PHP string.
Trying further the following code
foreach ($intro as $item) {
$string = $item->nodeValue;
echo $string;
}
gives only the text value, all the tags are stripped and I really need all those divs, h1 and h2 and p tags preserved for further manipulation needs.
Trying:
foreach ($intro->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo $name;
echo $value;
}
is giving the error:
Notice: Undefined property: DOMNodeList::$attributes in
So how could I extract the full HTML code of the found DOM elements?
I knew I was so close... I just needed to do: