Extracting parts of an html code

Question

Extracting parts of an html code

50 views Asked by Justin At 18 March 2014 at 15:48

Let's say I had the below HTML code:

<p>Test text</p>
<p><img src="test.jpg" /></p>
<div id="test"><p>test</p></div>
<div class="block">
    <img src="test2.jpg">
</div>
<p>test</p>

Parameters:

There will exist a div block with class "block"
There can be any amount of HTML code above or below the div block with class "block"
There could even be two div blocks with class "block"

I was using PHP's XPath to look at this HTML code using DOM. I want to be able to return two things:

The div block with class "block"
All the rest of the code without the div element with class "block" in it

Something like:

Block Code:

<div class="block">
    <img src="test2.jpg">
</div>

Original without block code:

<p>Test text</p>
<p><img src="test.jpg" /></p>
<div id="test"><p>test</p></div>
<p>test</p>

Original Q&A

There are 1 answers

**davidkonrad** · Answer 1 · 2014-03-18T16:15:32+00:00

By using DOMDocument you can do it like this :

$content = '<p>Test text</p>'.
        '<p><img src="test.jpg" /></p>'.
        '<div id="test"><p>test</p></div>'.
        '<div class="block">'.
        '<img src="test2.jpg">'.
        '</div>'.
        '<p>test</p>';

$blocks = array();
$doc = new DOMDocument();
$doc->loadHTML($content);

$elements = $doc->getElementsByTagName("*");
foreach ($elements as $element) {
    if($element->hasAttributes()) {
        if ($element->getAttribute('class') == 'block') {
            //add block HTML to block array
            $blocks[]=$doc->saveHTML($element);
            //remove blocck element
            $element->parentNode->removeChild($element);
        }
    }
}

echo '<pre>';
echo $blocks[0]; //iterate or print_r if multiple blocks
echo $doc->saveHTML();
echo '</pre>';

outputs the "block code" :

<div class="block"><img src="test2.jpg"></div>

and the "original without block code" :

<p>Test text</p><p><img src="test.jpg"></p><div id="test"><p>test</p></div><p>test</p>

If you simply cant accept that DOMDocument "enriches" the HTML with doctype, html and body, which can be very annoying when you want the complete document, not just some extracts, you can use this neat function and extract the body innerHTML with :

echo DOMinnerHTML($doc->getElementsByTagName('body')->item(0));

TechQA.

Extracting parts of an html code

There are 1 answers

Related Questions in PHP

Related Questions in HTML

Related Questions in PARSING

Related Questions in XPATH

Popular Questions

Popular Tags

Trending Questions