The method createDOM not return document

298 views Asked by At

I use HtmlCleaner 2.6.1 and Xpath to parse html page in Android application. Here html page:

  1. http://www.kino-govno.com/comments/42571-postery-kapitan-fillips-i-poslednij-rubezh

  2. http://www.kino-govno.com/comments/42592-fantasticheskie-idei-i-mesta-ih-obitanija

    The first link return document, is all right.The second link here in this place:

    document = domSerializer.createDOM(tagNode);
    

    returns nothing.

If you create a simple java project without android. That all works fine.

Here is the Code :

        String queries = "//div[starts-with(@class, 'news_text op')]/p";            
        URL url = new URL(link2);
        TagNode tagNode = new HtmlCleaner().clean(url);
        CleanerProperties cleanerProperties = new CleanerProperties();
        DomSerializer domSerializer = new DomSerializer(cleanerProperties);
        document = domSerializer.createDOM(tagNode);
        xPath = XPathFactory.newInstance().newXPath();
        pageNode = (NodeList)xPath.evaluate(queries,document, XPathConstants.NODESET);
        String val = pageNode.item(0).getFirstChild().getNodeValue();
1

There are 1 answers

0
Jens Erat On

That's because HtmlCleaner wraps the paragraphs of the second HTML page into another <div/>, so it is not a direct child any more. Use the descendent-or-self-axis // instead of the child-axis /:

//div[starts-with(@class, 'news_text op')]//p