How can I embed DocBook XSLT transformation in a Java web app?

164 views Asked by At

I'm at the proof-of-concept phase of building some DocBook → PDF transformation into a web application. The basic requirements are:

  • It has to run "out of the JAR"—setting up the stylesheet as files on the appserver's filesystem is not what I'm after.
  • It's not based on Spring, so I'm after a more generic Java solution.
  • We're currently using the DocBook 1.79.2 stylesheets, though could probably use the xslt20 stylesheets if more appropriate.
  • We're currently using Saxon-HE 12.3 in the proof-of-concept, but could definitely upgrade that to a commercial version.

The TLDR is: How do I encapsulate the DocBook XSLT stylesheets in a JAR (that doesn't require exploding the JAR into files on the filesystem)?

As recently discussed on the docbook-apps mailing list, I can get quite a bit of the way by starting with the stylesheets in src/main/resources/xsl (with some customisations at that level, and then the DocBook stylesheets in src/main/resources/xsl/docbook-xsl-1.79.2), a catalog that starts like this:

<?xml version="1.0" encoding="utf-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
   <uri name="file:/xsl/juno-driver.xsl"
         uri="classpath:/xsl/juno-driver.xsl" />
   <uri name="file:/xsl/header-footer.xsl"
         uri="classpath:/xsl/header-footer.xsl" />
   <uri name="file:/xsl/table.xsl"
         uri="classpath:/xsl/table.xsl" />
   <uri name="file:/xsl/titlepage.xsl"
         uri="classpath:/xsl/titlepage.xsl" />
   <uri name="file:/xsl/docbook-xsl-1.79.2/fo/docbook.xsl"
         uri="classpath:/xsl/docbook-xsl-1.79.2/fo/docbook.xsl" />
   <uri name="file:/xsl/docbook-xsl-1.79.2/VERSION.xsl"
         uri="classpath:/xsl/docbook-xsl-1.79.2/VERSION.xsl" />
   <uri name="file:/xsl/docbook-xsl-1.79.2/fo/param.xsl"
         uri="classpath:/xsl/docbook-xsl-1.79.2/fo/param.xsl" />

(and goes on to map every .xsl, .xml, .ent, and .dtd file to its classpath: URI equivalent), and some code like this:

DOMResult result = new DOMResult();
TransformerFactory factory = TransformerFactory.newInstance();
InputStream is = XmlTest.class.getResourceAsStream("/xsl/juno-driver.xsl");
Source source = new StreamSource(is, "file:/xsl/juno-driver.xsl");
Transformer transformer = factory.newTransformer(source);
transformer.transform(new DOMSource(document), result);
return (Document) result.getNode();

This almost gets us there, but fails:

Error at char 9 in expression in xsl:param/@select on line 18 column 57 of l10n.xsl:
  FODC0002  I/O error reported by XML parser processing
  file:///xsl/docbook-xsl-1.79.2/common/l10n.xsl. Caused by java.io.FileNotFoundException:
  /xsl/docbook-xsl-1.79.2/common/l10n.xsl (No such file or directory)
at parameter local.l10n.xml on line 18 column 57 of l10n.xsl:
     invoked by global parameter local.l10n.xml at file:///xsl/docbook-xsl-1.79.2/common/l10n.xsl#18

Where that line involves a call to document(''):

<xsl:param name="local.l10n.xml" select="document('')"/>

Looks like it's insisting on loading itself from a file, and then (obviously) can't find it at that URI. How do we tell whoever is resolving calls to the document() function to use the classpath?

I have pushed a minimal example of the problem to GitHub: you can clone the repo and run mvn clean test to reproduce.

I'd also settle for advice on any other approach to getting this done that meets the list of constraints at the top of the post!

1

There are 1 answers

2
Jukka Matilainen On BEST ANSWER

I think there are multiple ways to do this. One way to do this would be to add support for accessing resources in the classpath by URLs. This way you could point to the stylesheets in your classpath with a URL, without having to have a catalog in place.

You could do it for example by registering the class below as a URLStreamHandlerProvider implementation. The implementation is adapted from this answer, but changed to support the optional leading slash in the URL path and also changed to use the cp: scheme name instead of the more conventional classpath:.

  • The leading slash is useful in the URLs so that they get treated as hierarchical URLs, so that relative references can be resolved.
  • The change of the scheme (protocol) name to cp: is because Saxon-HE (at least version 12.3) appears to have a workaround specific for classpath: URLs in place, which causes a problem with the leading slash from the path getting dropped off when it resolves relative classpath: URLs.

In Java 9 and above you can register the provider by putting the fully qualified name of the class in the configuration file META-INF/services/java.net.spi.URLStreamHandlerProvider.

With this in place, you should be able to point to your stylesheets with an URL like cp:/xsl/docbook-xsl-1.79.2/html/docbook.xsl and have it work without a catalog, including relative imports, as long as your XSLT processor uses (or at least falls back to) this method of dereferencing URLs. Based on a quick test, this approach seems to work with at least the Xalan-Java and Saxon-HE XSLT processors. (I think the default XSLT processor included with Java might have some issues when using the docbook-xsl stylesheets.)

package com.stackoverflow.q76848364;

import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLStreamHandler;
import java.net.spi.URLStreamHandlerProvider;

/**
 * URL stream handler for "cp:/" URLs for accessing resources in the classpath.
 * Supports a leading slash in the the path so that the scheme is treated as a
 * hierarchical scheme for resolving relative URL references.
 * 
 * <p>
 * Register this provider by putting the fully qualified name of this class in
 * the configuration file
 * META-INF/services/java.net.spi.URLStreamHandlerProvider.
 */
public class ClasspathURLStreamHandlerProvider extends URLStreamHandlerProvider {

    private static final String PROTOCOL = "cp";

    @Override
    public URLStreamHandler createURLStreamHandler(String protocol) {
        if (PROTOCOL.equals(protocol)) {
            return new URLStreamHandler() {
                @Override
                protected URLConnection openConnection(URL url) throws IOException {
                    String urlPath = url.getPath();
                    String resourcePath = urlPath.startsWith("/") ? urlPath.substring(1) : urlPath;
                    return ClassLoader.getSystemClassLoader().getResource(resourcePath).openConnection();
                }
            };
        }
        return null;
    }

}

Edited to add: Caution about resolving relative URI references in Java

When working with relative URI references in Java, please note that there is a bug in the java.net.URI.resolve() method that affects resolving relative URI references when the relative URI is empty (bug JDK-8218962 in the Java bug database). The docbook-xsl stylesheets rely on this working correctly, so there will be problems if one tries to use anything that relies on the java.net.URI class for this functionality. Since both Xalan-Java and Saxon-HE seem to work OK, they must be using something else.

Edited to add (2): Demonstration

I created a pull request demonstrating this solution against the provided minimal example. (The original example was set to target Java 8. Since the method of registering URLStreamHandler implementations is different between Java 8 and Java 9+, I changed the compile target to Java 9 instead to demonstrate the newer approach.)