From a DOM Document Node, how do you retrieve the original HTML source?

334 views Asked by At

Given a DOMDocument Object as a parameter, such as the below:

class Comparison {

    public function __construct($domDocument=null){
        $anchors = $domDocument->getElementsByTagName('a');
        if($anchors && 0 < count($anchors)){
            foreach($anchors as $anchor){
                $original = ''; // Not sure how to get this
                $ordered = $this->rearrangeAttributes($anchor);
                $difference = $this->diff($original,$ordered);
                echo 'Original Source: '.$original."\n";
                echo 'Ordered Source: '.$ordered."\n";
                echo 'Difference: '.$difference."\n\n";
            }
        }
    }

}

How do you get the original HTML string indicated by $original?

My current approach is from here: http://php.net/manual/en/class.domnode.php

Try to get the parent of the node in question, get the innerHTML, however given that a certain degree of alteration happens on original source code in the conversion, it doesn't look like a robust way to do it. Are there ways to do this in a more effective fashion? I can pass in the raw HTML as well, but want to avoid the rabbit hole if there's a known solution.

UPDATE: If you want the parent source (cleaned) and the original doesn't matter, then Marc B's linked file is very useful: How to return outer html of DOMDocument?

But if you want the original source, you can try getting the line number http://php.net/manual/en/domnode.getlineno.php although, it's not clear if that's the cleaned source code or the original raw source code. Insight welcome!

0

There are 0 answers