SaxonJS: not getting an real HTML DOM tree

265 views Asked by At

I have a complicated stylesheet which when done executing should replace not just the body of the current html element, but the entire html element with all the children, head and body.

function applyStylesheetUsingSaxon() {
    SaxonJS.setLogLevel(10);
    options = {
      sourceText: getSourceXML(),
      stylesheetLocation: "spl.sef.json",
      destination: "document"
    };
    result = SaxonJS.transform(options);
    document.replaceChildren();
    document.appendChild(result.principalResult.firstElementChild);
}

When I do this, some FairAdBlocker extension will try to access document.body and get null and then nothing works any more. The produced html also does look suspicious.

I thought maybe I just get HTML as a string and can put it as innerHtml or so, with:

  destination: "serialized"
};
result = SaxonJS.transform(options);

the error I get is:

Serializer does not support the requested HTML version: 1.0',

So how can I get a decent HTML DOM produced? Note, I can't use replaceBody because I need to replace the headers as well.

MORE DETAILS: I'm responding to some clarifying question in the comments:

Is the DOM up to that approach to rip of all children from a document to then add a new root element?

This has worked just fine with the result from the built-in XSLTProcessor in Chrome.

What does that talk about the extension mean, can you disable that and check whether your approach works without the extension interfering?

That extension issue is a red herring. The fact is that the document.body returns nothing after the replacement operation, and the nodes look differently.

Also, what is a minimal but complete sample of XML input, XSLT, wanted HTML result and the "suspicious" HTML you say you get?

Here I show you the difference just looking at the output of the transform. This is debugging console output using the built-in XSLTProcessor

enter image description here

here is what comes from Saxon-JS:

enter image description here

and I guess the issue is that the result is a #document-fragment. Here is some view of the details looked at as javascript objects rather than as HTML. First what comes from the built-in XSLTProcessor:

enter image description here

and here is what comes from Saxon-JS:

enter image description here

As for the serialization attempt, well, do you use the HTML output method with e.g. html-version="5.0" or no explicitly set version or html-version? The error sounds as if you set method="html" version="1.0".

Indeed we have

<xsl:output method="html" version="1.0" encoding="UTF-8" indent="no" doctype-public="-"/>

which I think was to fiddle with quirks mode or something because of former need to be compatible to IE.

IN SUMMARY: I think the analysis of what we get out from the built-in XSLTProcessor (document) vs. Saxon-JS (document-fragment). If Saxon-JS actually was to produce a document and have an destination option to replace the entire document content, then it would be great. Not having that, I should still be able to make a workaround.

What I don't understand is why, when I take the root node () from the built-in XSLTProcessor result #document.firstElementChild and append it to the current (and empty) document, then that document.body property comes with the new body. But when I do the same with the Saxon-JS result #document-fragment.firstElementChild then the document.body returns null, despite the two .firstElementChild () root nodes being pretty much the same kind of thing in both cases. (Hard to tell the difference, but neither has a "body" property, both have two children, and .

3

There are 3 answers

2
Martin Honnen On

Here at https://martin-honnen.github.io/xslt/2022/replaceChildrenTest4.html is an example using Saxon-JS 2.3 to run a HTML DOM to HTML DOM transformation and then using your first approach of replaceChildren() to first remove the existing document's children to, in the second step, appendChild the result of the Saxon-JS transformation.

<html lang="en">
  <head>
    <meta charset="UTF-8">
    <title>Test</title>
    <style>
    .sample {
      color: red;
    }
    </style>
    <script src="../../Saxon-JS-2.3/SaxonJS2.js"></script>
    <script>
    function runXslt() {
      const xslt = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0" xpath-default-namespace="http://www.w3.org/1999/xhtml">
        <xsl:output method="html"/>
        <xsl:mode on-no-match="shallow-copy"/>
        <xsl:template match="meta[@charset]"/>
        <xsl:template match="style">
          <xsl:copy>
            <xsl:value-of select="if (contains(., 'red')) then replace(., 'red', 'green') else replace(., 'green', 'red')"/>
          </xsl:copy>
       </xsl:template>
       </xsl:stylesheet>`;
       
       var resultFragment = SaxonJS.XPath.evaluate(`transform(map {
         'source-node' : .,
         'stylesheet-text' : $xslt
         })?output`,
         document,
         { params : { xslt : xslt } }
      );
         
      console.log(resultFragment);
      
      document.replaceChildren();
      document.appendChild(resultFragment);
    }
    </script>
  </head>
  <body>
    <h1>Test</h1>
    <p class="sample">This is a test.</p>
    <input type="button" value="test" onclick="runXslt();">
  </body>
</html>

For debugging ease I have avoided the use of SEF and directly run the XSLT source code through fn:transform but I would think the same approach would work with a precompiled SEF and the SaxonJS.transform API. A test doing that is at https://martin-honnen.github.io/xslt/2022/replaceChildrenTest5.html and works the same.

0
Norm On

I suspect that the problem is in the FairAdBlocker extension. The technique you outline works fine for me. My guess is that the FairAdBlocker detects the DOM change, probably when you do removeChildren() and throws an NPE which causes the JavaScript execution to stop (or something).

Here's a (somewhat crudely coded) solution that replaces the contents of the head and body elements without ever removing them. Perhaps the extension will let this pass...

function applyStylesheetUsingSaxon() {
    SaxonJS.setLogLevel(10);
    options = {
      sourceText: "<doc><title>Spoon!</title><para>Hello.</para></doc>",
      stylesheetLocation: "replace.sef.json",
      destination: "document"
    };
    let result = SaxonJS.transform(options)
    let newhtml = result.principalResult.firstChild;
    // Hack: we just assume that the current and generated pages
    // are rooted at html and contain a single head and a single body
    let oldhtml = document.querySelector("html");
    replaceChildren(oldhtml, newhtml, "head");
    replaceChildren(oldhtml, newhtml, "body");
}

function replaceChildren(oldelem, newelem, name) {
   let src = null;
   for (let pos = 0; pos < newelem.childNodes.length; pos++) {
     if (newelem.childNodes[pos].localName == name) {
       src = newelem.childNodes[pos];
       break;
     }
   }

   let tgt = null;
   for (let pos = 0; pos < oldelem.childNodes.length; pos++) {
     if (oldelem.childNodes[pos].localName == name) {
       tgt = oldelem.childNodes[pos];
       break;
     }
   }
   
   if (src == null || tgt == null) {
     // This should never happen…
     console.log("Failed to find " + name);
     return;
   }

   while (tgt.childNodes.length > 0) {
     tgt.removeChild(tgt.childNodes[0]);
   }
   while (src.childNodes.length > 0) {
     // This append removes the node from newelem
     tgt.appendChild(src.childNodes[0]);
   }
}  
4
Norm On

Replacing the entire document isn't our recommended approach: https://www.saxonica.com/saxon-js/documentation2/index.html#!browser/result-documents

Can you say a little more about why you want to do it this way?