Saxon-JS override a document() function which does not fail on same-origin policy

178 views Asked by At

I am trying to use Saxon-JS to transforms a local XML file. And I run into the silly same origin restriction, for

  • loading of the stylesheet
  • loading of anything using the document function

I call the same-origin policy "silly" because it is easily circumvented by loading scripts instead, which makes all this same origin stuff appear rather pathetic. Now working with this theme, here is how I go about it: when the browser loads a file like this test.xml

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <script xmlns="http://www.w3.org/1999/xhtml" type="text/javascript" src="boot.js" defer="true"/>
  <foo/>
  <bar/>
</root>

We can have the boot.js do this:

function appendScript(src, fn) {
  const lastScript = ((x) => x[x.length - 1])(document.getElementsByTagNameNS("http://www.w3.org/1999/xhtml", "script"));
  const script = document.createElementNS("http://www.w3.org/1999/xhtml", "script");
  script.type = "text/javascript";  
  script.charset = 'utf-8';
  script.src = src; 
  script.defer = true;
  script.async = true;
  if(fn)
    script.onload = fn;
  lastScript.parentElement.insertBefore(script, lastScript.nextSibling);
  return script;
}   

appendScript("SaxonJS2.rt.js");
appendScript("test.sef.js");

tools = {}

function transform() {
  SaxonJS.transform({
      sourceNode: document,
      stylesheetText: style,
      destination: "document"}, 'async')
  .then(result => {
      document.replaceChildren();
      document.appendChild(result.principalResult); });

setTimeout(transform, 500);

Something like that. Notice that already I cannot use stylesheetLocation option, because that would try an XHR which would fail. But if I simply turn the test.sef.json file into a javascript, like this:

(echo -n 'style='; jq '.|tostring'  test.sef.json ; echo ';') > test.sef.js

then the inclusion with a script will load this JSON string as the variable style.

In the same way now, I can in XSLT replace the normal document function with

<xsl:function name="f:document">
    <xsl:param name="url"/>
    <xsl:choose>
        <xsl:when test="true() or function-available('js:tools.document')">
            <xsl:sequence select="js:document.xslt.document($url)"/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:if test="doc-available($url)">
                <xsl:sequence select="document($url)"/>
            </xsl:if>
        </xsl:otherwise>
    </xsl:choose>
</xsl:function>

to which I note that the function-available somehow doesn't work. But that could be a small issue, and for my testing it doesn't matter and I put the true() or ... here.

Now this is how this tools.document function is defined:

tools.document = function(url) {
  if(!tools.docs) tools.docs = {};  
  let doc = tools.docs[url];
  if(!doc) {
    const script = this.appendScript(url+".js", function() {
                     const parser = new DOMParser();
                     doc = parser.parseFromString(data, "text/xml");
                     if(doc)
                       tools.docs[url] = doc;
                     delete window.data;
                     script.remove(); });
  }
  return doc;
}

As you can see, these XML documents are loaded in also disguised as scripts, such as include.xml.js:

data=`<doc><foo/><bar/></doc>`

So, here is the problem. The transform is running in synchronous mode, and on the call of this js:tools.document() function, we need to let go of the current execution thread in order for the script element to be sourced and executed. Only then can we return to it and parse this data and stick the result into our document cache.

So, if I call f:document("include.xml") then we cannot get that document upon its synchronous result. But if I hit it manually once from the javascript console

tools.document("include.xml")

before executing the transform() activity, then the transform will succeed with the document properly sourced.

I wonder what I could do minimally to give back control to the main javascript loop while still executing the transform? I wonder if there is some sort of trick by which one might capture the current state in some kind of continuation closure, and get back to it by adding that to the setTimeout queue?

0

There are 0 answers