I'm at the proof-of-concept phase of building some DocBook → PDF transformation into a web application. The basic requirements are:
- It has to run "out of the JAR"—setting up the stylesheet as files on the appserver's filesystem is not what I'm after.
- It's not based on Spring, so I'm after a more generic Java solution.
- We're currently using the DocBook 1.79.2 stylesheets, though could probably use the xslt20 stylesheets if more appropriate.
- We're currently using Saxon-HE 12.3 in the proof-of-concept, but could definitely upgrade that to a commercial version.
The TLDR is: How do I encapsulate the DocBook XSLT stylesheets in a JAR (that doesn't require exploding the JAR into files on the filesystem)?
As recently discussed on the docbook-apps mailing list, I can get quite a bit of the way by starting with the stylesheets in src/main/resources/xsl
(with some customisations at that level, and then the DocBook stylesheets in src/main/resources/xsl/docbook-xsl-1.79.2
), a catalog that starts like this:
<?xml version="1.0" encoding="utf-8"?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri name="file:/xsl/juno-driver.xsl"
uri="classpath:/xsl/juno-driver.xsl" />
<uri name="file:/xsl/header-footer.xsl"
uri="classpath:/xsl/header-footer.xsl" />
<uri name="file:/xsl/table.xsl"
uri="classpath:/xsl/table.xsl" />
<uri name="file:/xsl/titlepage.xsl"
uri="classpath:/xsl/titlepage.xsl" />
<uri name="file:/xsl/docbook-xsl-1.79.2/fo/docbook.xsl"
uri="classpath:/xsl/docbook-xsl-1.79.2/fo/docbook.xsl" />
<uri name="file:/xsl/docbook-xsl-1.79.2/VERSION.xsl"
uri="classpath:/xsl/docbook-xsl-1.79.2/VERSION.xsl" />
<uri name="file:/xsl/docbook-xsl-1.79.2/fo/param.xsl"
uri="classpath:/xsl/docbook-xsl-1.79.2/fo/param.xsl" />
(and goes on to map every .xsl
, .xml
, .ent
, and .dtd
file to its classpath:
URI equivalent), and some code like this:
DOMResult result = new DOMResult();
TransformerFactory factory = TransformerFactory.newInstance();
InputStream is = XmlTest.class.getResourceAsStream("/xsl/juno-driver.xsl");
Source source = new StreamSource(is, "file:/xsl/juno-driver.xsl");
Transformer transformer = factory.newTransformer(source);
transformer.transform(new DOMSource(document), result);
return (Document) result.getNode();
This almost gets us there, but fails:
Error at char 9 in expression in xsl:param/@select on line 18 column 57 of l10n.xsl:
FODC0002 I/O error reported by XML parser processing
file:///xsl/docbook-xsl-1.79.2/common/l10n.xsl. Caused by java.io.FileNotFoundException:
/xsl/docbook-xsl-1.79.2/common/l10n.xsl (No such file or directory)
at parameter local.l10n.xml on line 18 column 57 of l10n.xsl:
invoked by global parameter local.l10n.xml at file:///xsl/docbook-xsl-1.79.2/common/l10n.xsl#18
Where that line involves a call to document('')
:
<xsl:param name="local.l10n.xml" select="document('')"/>
Looks like it's insisting on loading itself from a file, and then (obviously) can't find it at that URI. How do we tell whoever is resolving calls to the document()
function to use the classpath?
I have pushed a minimal example of the problem to GitHub: you can clone the repo and run mvn clean test
to reproduce.
I'd also settle for advice on any other approach to getting this done that meets the list of constraints at the top of the post!
I think there are multiple ways to do this. One way to do this would be to add support for accessing resources in the classpath by URLs. This way you could point to the stylesheets in your classpath with a URL, without having to have a catalog in place.
You could do it for example by registering the class below as a
URLStreamHandlerProvider
implementation. The implementation is adapted from this answer, but changed to support the optional leading slash in the URL path and also changed to use thecp:
scheme name instead of the more conventionalclasspath:
.cp:
is because Saxon-HE (at least version 12.3) appears to have a workaround specific forclasspath:
URLs in place, which causes a problem with the leading slash from the path getting dropped off when it resolves relativeclasspath:
URLs.In Java 9 and above you can register the provider by putting the fully qualified name of the class in the configuration file
META-INF/services/java.net.spi.URLStreamHandlerProvider
.With this in place, you should be able to point to your stylesheets with an URL like
cp:/xsl/docbook-xsl-1.79.2/html/docbook.xsl
and have it work without a catalog, including relative imports, as long as your XSLT processor uses (or at least falls back to) this method of dereferencing URLs. Based on a quick test, this approach seems to work with at least the Xalan-Java and Saxon-HE XSLT processors. (I think the default XSLT processor included with Java might have some issues when using the docbook-xsl stylesheets.)Edited to add: Caution about resolving relative URI references in Java
When working with relative URI references in Java, please note that there is a bug in the
java.net.URI.resolve()
method that affects resolving relative URI references when the relative URI is empty (bug JDK-8218962 in the Java bug database). The docbook-xsl stylesheets rely on this working correctly, so there will be problems if one tries to use anything that relies on thejava.net.URI
class for this functionality. Since both Xalan-Java and Saxon-HE seem to work OK, they must be using something else.Edited to add (2): Demonstration
I created a pull request demonstrating this solution against the provided minimal example. (The original example was set to target Java 8. Since the method of registering
URLStreamHandler
implementations is different between Java 8 and Java 9+, I changed the compile target to Java 9 instead to demonstrate the newer approach.)