Getting Exception on evaluating an XPath expression in Java

Question

Getting Exception on evaluating an XPath expression in Java

745 views Asked by A Beginner At 04 November 2018 at 16:33

I am trying to learn the usage of Xpath expressions with Java. I am using Jtidy to convert the HTML page to XHTML so that I can easily parse it using XPath expressions. I have the following code:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);


DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = ConvertXHTML("https://twitter.com/?lang=fr");

//Create XPath

XPathFactory xpathfactory = XPathFactory.newInstance();
XPath Inst= xpathfactory.newXPath();
NodeList nodes = (NodeList)Inst.evaluate("//p/@align",doc,XPathConstants.NODESET);
    for (int i = 0; i < nodes.getLength(); ++i) 
   {
            Element e = (Element) nodes.item(i);
            System.out.println(e);
    }

public Document ConvertXHTML(String link){
  try{

      URL u = new URL(link);

     BufferedInputStream instream=new BufferedInputStream(u.openStream());
     FileOutputStream outstream=new FileOutputStream("out.xhtml");

     Tidy c=new Tidy();
     c.setShowWarnings(false);
     c.setInputEncoding("UTF-8");
     c.setOutputEncoding("UTF-8");
     c.setXHTML(true);

     return c.parseDOM(instream,outstream);
     }

It's working fine for most URLs but this one :

https://twitter.com/?lang=fr

I am getting this exception because of it:

javax.xml.transform.TransformerException: Index -1 out of bounds.....

Below is a part of stack trace I am getting:

javax.xml.transform.TransformerException: Index -1 out of bounds for length 128
at java.xml/com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:366)
at java.xml/com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:303)
at java.xml/com.sun.org.apache.xpath.internal.jaxp.XPathImplUtil.eval(XPathImplUtil.java:101)
at java.xml/com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:80)
at java.xml/com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:89)
at files.ExampleCode.GetThoselinks(ExampleCode.java:50)
at files.ExampleCode.DoSomething(ExampleCode.java:113)
at files.ExampleCode.GetThoselinks(ExampleCode.java:81)
at files.ExampleCode.DoSomething(ExampleCode.java:113)

I am not sure whether the problem is in the converted xhtml of the website or something else. Can anyone tell what is wrong in the code? Any edits would be helpful.

Original Q&A

There are 2 answers

**Michael Kay** · Answer 1 · 2018-11-04T21:46:12+00:00

I would normally say that an index-of-bounds exception coming from deep within the XPath engine is a bug in the XPath engine. The only caveat is if there's something structurally wrong with the DOM that the XPath engine is searching; an XPath processor is entitled to make reasonable assumptions that the DOM is valid and to crash if it isn't. In that case it would be a bug in Tidy, which created the DOM.

**user3969107** · Answer 2 · 2021-08-24T20:59:35+00:00

I had a similar problem using xpath evaluation on a document produced by JTidy. I got around it by having JTidy serialize the DOM it produced to a file, and then parsing that xml file with javax.xml.parsers.DocumentBuilder to get a 2nd version of the DOM. Bizarre as it seems, using the 2nd one avoided the out of bounds exception and worked. Use code like the following:

        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        documentBuilderFactory.setNamespaceAware(true);
        // If you don't do the following, it will take a full minute to parse the xml document (presumably the time-out
        // period for trying to load the DTD). See https://stackoverflow.com/questions/6204827/xml-parsing-too-slow.
        documentBuilderFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
        documentBuilder = documentBuilderFactory.newDocumentBuilder();
        Document doc = tidy.parseDOM(input, null);
        FileOutputStream fos = new FileOutputStream("temp.xml");
        tidy.pprint(doc, fos);
        fos.close();
        doc = documentBuilder.parse("temp.xml");

TechQA.

Getting Exception on evaluating an XPath expression in Java

There are 2 answers

Related Questions in JAVA

Related Questions in XPATH

Related Questions in XHTML

Related Questions in JTIDY

Popular Questions

Popular Tags

Trending Questions