XProc: multiple XSLT transformation with intermediate files

2.3k views Asked by At

I need to do several XSLT transformations with intermediate XML files. (I need the files, the real case is a bit more tricky as a later step loads intermediate files)

first.xml ------------>   intermediate.xml ------------> final.xml
          first.xsl                         final.xsl

I'd like to create an XProc pipleline. I have tried to write the following code, but this gives me an error:

SCHWERWIEGEND: runxslt.xpl:26:44:err:XD0011:Could not read: intermediate.xml 
17.05.2012 15:15:35 com.xmlcalabash.drivers.Main error
SCHWERWIEGEND: It is a dynamic error if the resource referenced by a p:document element does not exist, cannot be accessed, or is not a well-formed XML document.
17.05.2012 15:15:35 com.xmlcalabash.drivers.Main error
SCHWERWIEGEND: Underlying exception: net.sf.saxon.s9api.SaxonApiException: I/O error reported by XML parser processing file:/<somepath>/intermediate.xml:
/<somepath>/intermediate.xml (No such file or directory)

(where SCHWERWIEGEND means something like FATAL) So obviously the file intermediate.xml has not been written.

This is the xpl-document that I have used:

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">

  <p:input port="source">
    <p:document href="first.xml"/>
  </p:input>

  <p:output port="result" sequence="true">
    <p:empty/>
  </p:output>

  <p:xslt name="first-to-intermediate">
    <p:input port="stylesheet">
      <p:document href="first.xsl"/>
    </p:input>
    <p:input port="parameters">
      <p:empty/>
    </p:input>
  </p:xslt>

  <p:store href="intermediate.xml" />

  <p:xslt>
    <p:input port="source">
      <p:document href="intermediate.xml"/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="final.xsl"/>
    </p:input>
    <p:input port="parameters">
      <p:empty/>
    </p:input>
  </p:xslt>

  <p:store href="final.xml"/>

</p:declare-step>

Just for the sake of completeness: these are the transformation files:

source.xml:

<root>
  <element name="A" />
</root>

first.xsl:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  <xsl:output indent="yes"/>

  <xsl:template match="root">
    <root>
      <xsl:apply-templates/>
    </root>
  </xsl:template>
  <xsl:template match="element">
    <intermediate name="A" />
  </xsl:template>

</xsl:stylesheet>

final.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">
  <xsl:output indent="yes"/>

  <xsl:template match="root">
    <root>
      <xsl:apply-templates/>
    </root>
  </xsl:template>
  <xsl:template match="intermediate">
    <final name="A" />
  </xsl:template>

</xsl:stylesheet>

Here is a note on the real application (the above is a simplification, of course).

  1. First step: convert the source into something more suitable for my processing. Output: companies.xml
  2. Take the output from step 1 and create an index file (index.xml) from that. The index file must be editable manually.
  3. The third step is to merge the files created by step 1 and 2 and create a final xml (final.xml)

The index file must be written to disk and I must be able to run the last step by itself (that's a different problem - I'd write a different pipeline for that)

The output of companies.xml (step 1) is optional, it could be saved in memory (but it might get large).

1

There are 1 answers

5
grtjn On BEST ANSWER

I'm not really sure why XMLCalabash doesn't work here. I thought the logic should in principle work, but apparently XMLCalabash is holding off on writing the file to disk till later, perhaps even till the end. Not sure why.

But there is an elegant solution, because you don't need to store intermediate results before continuing processing. In fact, it is best to not use hard-coded loads and stores at all. Instead, use something like the following:

<?xml version="1.0" encoding="UTF-8"?> 
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> 

  <p:input port="source" sequence="true"/> 
  <p:input port="parameters" kind="parameter"/>
  <p:output port="result" sequence="true"/> 

  <p:xslt name="first-to-intermediate"> 
    <p:input port="stylesheet"> 
      <p:document href="first.xsl"/> 
    </p:input> 
  </p:xslt> 

  <p:xslt> 
    <p:input port="stylesheet"> 
      <p:document href="final.xsl"/> 
    </p:input> 
  </p:xslt> 

</p:declare-step> 

It requires a slightly different call to XMLCalabash. Call it like this:

java -jar Calabash.jar -i source=first.xml -o result=final.xml runxslt.xpl

With -i you tie an input source to an input file, but from outside the script so no hard-coding required. Similarly with -o you redirect output to a target file.

I also added a 'parameters' input to your code, which get automatically connected to those of p:xslt. That way you don't need to specify those with a p:empty. It also allows passing parameter values from the command-line into those xslt's.

And because I removed the p:store, the 'source' input of the second p:xslt is not necessary either. The results of the first p:xslt goes directly into the (primary) source input of the following step by default.

-- edit --

To elaborate on my own comments that you can do a p:store and reuse the output of the first p:xslt twice without loading the intermediate doc from disk. You can do it like this:

<?xml version="1.0" encoding="UTF-8"?> 
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" 
  xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0"> 

  <p:input port="source" sequence="false"/> 
  <p:input port="parameters" kind="parameter"/>
  <p:output port="result" sequence="false"/> 

  <p:xslt name="first-to-intermediate"> 
    <p:input port="stylesheet"> 
      <p:document href="first.xsl"/> 
    </p:input> 
  </p:xslt> 

  <p:store href="intermediate.xml"/>

  <p:xslt> 
    <p:input port="source"> 
      <p:pipe step="first-to-intermediate" port="result"/> 
    </p:input> 
    <p:input port="stylesheet"> 
      <p:document href="final.xsl"/> 
    </p:input> 
  </p:xslt> 

</p:declare-step> 

Note that I changed sequence=true to false on both input and output of the declare-step. Storing sequences of intermediate results requires extra care. This should prevent mistakes.

HTH!