Unclear results with jdom2 XPath query

543 views Asked by At

I have problem with jdom2 XPath:

test.xhtml code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="cs" lang="cs">
<head>
<title>mypage</title>
</head>
<body>
<div class="in">
<a class="nextpage" href="url.html">
<img src="img/url.gif" alt="to url.html" />
</a>
</div>
</body>
</html>

Java code:

Document document;
SAXBuilder saxBuilder = new SAXBuilder();

document = saxBuilder.build("test2.html");
XPathFactory xpfac = XPathFactory.instance();
XPathExpression<Element> xp = xpfac.compile("//a[@class = 'nextpage']", Filters.element());
for (Element att : xp.evaluate(document) ) {
  System.out.println("We have target " + att.getAttributeValue("href"));
}

But just with this I can't get any element. I found that when query is //*[@class = 'nextpage'], it finds it.

We have target url.html

It must be something with namespace or anything other in header because without it it can generate some output. I don't know what I'm doing wrong.

1

There are 1 answers

0
rolfl On

Note: Alkthough this is the same issue as described in the suggested duplicate, that other question relates to JDOM versions 1.x. In JDOM 2.x there are a number of significant differences. This answer relates to JDOM 2.x XPath implementation which is significantly different.

The XPath specification is very clear about how namespaces are treated in XPath expressions. Unfortunately, for people familiar with XML, the XPath handling for Namespaces is slightly different than their expectations. This is the specification:

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.

In practice, what this means, is that any time you have a 'default' namespace in your XML document, you still need to prefix that namespace when using it in an XPath expression. The XPathFactory.compile(...) method alludes to this requirement in the JavaDoc, but it is not as clear as it should be. The prefix you use is arbitrary, and local to that XPath expression only. In your case, the code will look something like (assuming we choose the namespace xhtml for the URI http://www.w3.org/1999/xhtml):

XPathFactory xpfac = XPathFactory.instance();
Namespace xhtml = Namespace.getNamespace("xhtml", "http://www.w3.org/1999/xhtml");
XPathExpression<Element> xp = xpfac.compile("//xhtml:a[@class = 'nextpage']", Filters.element(), null, xhtml);
for (Element att : xp.evaluate(document) ) {
    System.out.println("We have target " + att.getAttributeValue("href"));
}

I should add this to the FAQ... Thanks.