HtmlCleaner XPath: get content of node without child nodes

205 views Asked by jacksbox At 05 November 2016 at 14:48

I´m using the HtmlCleaner library to parse a html file and extract some data via its XPath function. That works mostly pretty well, but I can´t find a way to get just the text content of a node (without the content of the child nodes). As stated in a lot of basic XPath documentations, text() should give the content of a node without its children's content, but the htmlcleaner integration doesn´t seem to follow this. Is there a way to do it with htmlcleaners XPath?

UPADTE: here is an example:

my html is this page, http://www.imdb.com/title/tt0499549/?ref_=nv_sr_1 here is a snippet of the html:

<div class="txt-block">
  <h4 class="inline">Budget:</h4>        
    $237,000,000      
  <span class="attribute">(estimated)</span>
</div>

this is my XPath (in this case div[7] takes the .txt-block div)

//*[@id='titleDetails']/div[7]/text()

this leads to "Budget: $237,000,000 (estimated)", but I only want the "$237,000,000" not the content of the h4 and not of the span.

Original Q&A

TechQA.

HtmlCleaner XPath: get content of node without child nodes

There are 0 answers

Related Questions in JAVA

Related Questions in XPATH

Related Questions in HTMLCLEANER

Popular Questions

Trending Questions