Search XML using XPath with JDOM/JAXEN/SAXON

3k views Asked by At

I do have an XML document that I'm parsing using JDOM-2.0.5. The following code works fine and bookNodes list contains all book nodes from my XML file:

SAXBuilder builder = new SAXBuilder();

// @see http://xerces.apache.org/xerces-j/features.html
// Disable namespace validation
builder.setFeature("http://xml.org/sax/features/namespaces", false);

Document doc = null;

try {
    doc = builder.build(xmlURL);
} catch (JDOMException | IOException e) {
    e.printStackTrace();
    return null; 
}

// get browse elmt
Element browse = doc.getRootElement().getChild("browse");

// Get all browse's chlidren
List<Element> bookNodes = browse.getChildren("book");

for (Element book : bookNodes) {
    // Do things with the selected nodes
    //...
}

And here's an example of my XML datas:

<?xml version="1.0" encoding="utf-8"?> 
<Books xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.example.com/XMLSchema" version="1">
    <status code="0"/>
    <link>http://www.example.com/books</link>
    <description>Browse, search and ....</description>
    <language>en-us</language>
    <pubDate>Sun, 09 Nov 2014 00:00:02 +0000</pubDate>
    <copyright>Copyright 2014, XXX</copyright>
    <category>Books</category>
    <browse>
        <book id="bk101">
            <author>Gambardella, Matthew</author>
            <title>XML Developer's Guide</title>
            <genre>Computer</genre>
            <price>44.95</price>
            <publish_date>2000-10-01</publish_date>
            <description>An in-depth look at creating applications 
            with XML.</description>
        </book>
        <book id="bk102">
            <author>Ralls, Kim</author>
            <title>The Midnight Rain</title>
            <genre>Fantasy</genre>
            <price>5.95</price>
            <publish_date>2000-12-16</publish_date>
            <description>A former architect battles corporate zombies, 
            an evil sorceress, and her own childhood to become queen 
            of the world.</description>
        </book>
        <book id="bk105">
            <author>Corets, Eva</author>
            <title>The Sundered Grail</title>
            <genre>Fantasy</genre>
            <price>5.95</price>
            <publish_date>2001-09-10</publish_date>
            <description>The two daughters of Maeve, half-sisters, 
            battle one another for control of England. Sequel to 
            Oberon's Legacy.</description>
        </book>
        <book id="bk106">
            <author>Randall, Cynthia</author>
            <title>Lover Birds</title>
            <genre>Romance</genre>
            <price>4.95</price>
            <publish_date>2000-09-02</publish_date>
            <description>When Carla meets Paul at an ornithology 
            conference, tempers fly as feathers get ruffled.</description>
        </book>
    </browse>
</Books>

Question.1:

I want to only select book nodes containing some text. So, I used XPath's query //book[contains(./title, 'The')] and jaxen-1.1.6 with following code:

filter = "//book[contains(./title, 'The')]"; // should return 2 elements (2nd and 3rd nodes)

// use the default implementation
XPathFactory xFactory = XPathFactory.instance();

XPathExpression<Element> expr = xFactory.compile(filter, Filters.element());

List<Element> bookNodes = expr.evaluate(doc);

But bookNodes list was empty !

What is wrong with my code ?

Question.2:

I am going to need more advanced functions to search my xml fields like using:

filter = "//book[matches(./title, '^ *XML.*?Developer.*?Guide *$', 'i')]"; // should return 1 element (1st node)

I'm then using saxon9he which supports XPath 2.0+ but I couldnt figure out how to make it work with JDOM2 and my code above.

So if you can initiate me to how to do that based on my code (I already googled for help but I couldnt find any)

Answering Q.1 will help me understand what I did wrong. But answering Q.2 will help me go forward with my little personal app.

Thank you

2

There are 2 answers

3
Ian Roberts On

The XPath language is only defined over namespace-well-formed XML, and can produce unexpected results if you try and use it on an XML tree that was parsed without namespaces. Rather than ignore namespace, you should use them correctly:

SAXBuilder builder = new SAXBuilder();
Document doc = null;

try {
    doc = builder.build(xmlURL);
} catch (JDOMException | IOException e) {
    e.printStackTrace();
    return null; 
}

Namespace ns = Namespace.getNamespace("http://www.example.com/XMLSchema");

// get browse elmt
Element browse = doc.getRootElement().getChild("browse", ns);

// Get all browse's chlidren
List<Element> bookNodes = browse.getChildren("book", ns);

for (Element book : bookNodes) {
    // Do things with the selected nodes
    //...
}

For XPath, you need to bind the namespace URI to a prefix:

filter = "//ns:book[contains(./ns:title, 'The')]";

// use the default implementation
XPathFactory xFactory = XPathFactory.instance();

XPathBuilder<Element> builder = new XPathBuilder(filter, Filters.element());
builder.setNamespace("ns", "http://www.example.com/XMLSchema");
XPathExpression<Element> expr = builder.compileWith(xFactory);

List<Element> bookNodes = expr.evaluate(doc);

Regarding question 2, Saxon's XPath engine can work with JDOM2 trees but you have to use Saxon's XPath API instead of JDOM's, which in turn means you have to use the standard javax.xml.xpath way of associating namespace prefixes with URIs, which is much more cumbersome than JDOM's - you have to define your own implementation of NamespaceContext or use a third party one such as Spring's SimpleNamespaceContext.

JDOM2DocumentWrapper docw =
        new JDOM2DocumentWrapper(doc, config); // net.sf.saxon.option.jdom2

XPathEvaluator xpath = new XPathEvaluator(); // net.sf.saxon.xpath
SimpleNamespaceContext nsCtx = new SimpleNamespaceContext();
nsCtx.bindNamespaceUri("ns", "http://www.example.com/XMLSchema");
xpath.setNamespaceContext(nsCtx);
List<?> bookNodes = (List<?>)xpath.evaluate(
   "//ns:book[matches(./ns:title, '^ *XML.*?Developer.*?Guide *$', 'i')]", docw,
   XPathConstants.NODESET);

(adapted from Saxon's JDOM2Example.java)

2
Michael Kay On

For completeness, here's how to do it with Saxon's s9api interface:

Processor proc = new Processor();
XdmNode docw = proc.newDocumentBuilder().wrap(doc);
XPathCompiler xpath = proc.newXPathCompiler();
xpath.declareNamespace("ns", "http://www.example.com/XMLSchema");
XdmValue bookNodes = xpath.evaluate(
   "//ns:book[matches(./ns:title, '^ *XML.*?Developer.*?Guide *$', 'i')]", docw);
for (XdmItem book : bookNodes) {
 ....
}