I am using woodstox to implement a StAX parser for XML files. Assume that I have a valid XML file with matching DTD somewhere in a common directory in my filesystem.
/path/to/test.xml
/path/to/test.dtd
The XML references to its DTD using a relative system identifier declaration as follows:
<!DOCTYPE test SYSTEM "test.dtd">
From a validation viewpoint, everything seems fine to me. (Is it? xmllint does not complain.) However, when I am trying to parse the file with the code below, woodstox throws a java.io.FileNotFoundException since it cannot find the relative DTD file. It seems to me that the implementation tries to access the DTD file relative to the working directory instead of relative to the XML file object.
import java.io.FileInputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
public class Test {
public static void main( String[] args ) throws Exception {
FileInputStream fileInputStream = new FileInputStream( args[0] );
XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
XMLStreamReader xsr = xmlInputFactory.createXMLStreamReader(fileInputStream);
while( xsr.hasNext() ) {
if( xsr.next() == XMLStreamConstants.DTD ) {
System.err.println( xsr.getText() );
}
}
}
}
- Is this intentional?
- Is there a convenient way to convince the StAX parser to load the DTD relative to a given XML file instead of relative to the working directory?
You are going to need to provide your own implementation of the
XMLResolver
interface (it's been known as EntityResolver in the SAX world) to help the parser find the DTD. TheXMLInputFactory
has thesetXMLResolver()
method that would do it for you.Some more information on the subject:
It's also a good idea to take a look under the hood to understand what exactly is going on when parsers need to resolve a SYSTEM URI. Woodstox, for example, has an internal (and a default) implementation of the
XMLResolver
(as well as a proxy between the SAX'sEntityResolver
and a StAXXMLResolver
). Look at what it does with your DTD "filename" and you will see why it's working the way it is.