I am trying to parse a document using Dom4J. This document comes from various providers, and sometimes comes with namespaces and sometimes without.
For eg:
<book>
<author>john</author>
<publisher>
<name>John Q</name>
</publisher>
</book>
or
<book xmlns="http://schemas.xml.com/XMLSchemaInstance">
<author>john</author>
<publisher>
<name>John Q</name>
</publisher>
</book>
or
<book xmlns:i="http://schemas.xml.com/XMLSchemaInstance">
<i:author>john</i:author>
<i:publisher>
<i:name>John Q</i:name>
</i:publisher>
</book>
I have a list of XPaths. I parse the document into a Document class, and then search on it using the xpaths.
Document doc = parseDocument(documentFile);
List<String> XmlPaths = new List<String>();
XmlPaths.add("book/author");
XmlPaths.add("book/publisher/name");
for (int i = 0; i < XmlPaths.size(); i++)
{
String searchPath = XmlPaths.get(i);
Node currentNode = doc.selectSingleNode(searchPath);
assert(currentNode != null);
}
This code does not work on the last document, the one that is using namespace prefixes.
I tried these techniques, but none of them seem to work.
1) changing the last element in the xpath to be namespace neutral:
/book/:author
/book/[local-name()='author']
/[local-name()='book']/[local-name()='author']
All of these throw an exception saying that the XPATH format is not correct.
2) Adding namespace uris to the XPAth, after creating it using DocumentHelper.createXPath();
Any idea what I am doing wrong?
FYI I am using dom4j version 1.5
Your XPath does not contain a tag name. The general syntax in your case would be
The important aspect is that the tag names are mandatory while the conditions are optional. If you do not want to specify a tag name you have use
*
for "any tag". There may be performance implications for large XML files since you will always have to iterate over a node set instead of using an index lookup. Maybe @MichaelKay can comment on this.Try this instead: