HtmlCleaner XPath: get content of node without child nodes

205 views Asked by At

I´m using the HtmlCleaner library to parse a html file and extract some data via its XPath function. That works mostly pretty well, but I can´t find a way to get just the text content of a node (without the content of the child nodes). As stated in a lot of basic XPath documentations, text() should give the content of a node without its children's content, but the htmlcleaner integration doesn´t seem to follow this. Is there a way to do it with htmlcleaners XPath?

UPADTE: here is an example:

my html is this page, http://www.imdb.com/title/tt0499549/?ref_=nv_sr_1 here is a snippet of the html:

<div class="txt-block">
  <h4 class="inline">Budget:</h4>        
    $237,000,000      
  <span class="attribute">(estimated)</span>
</div>

this is my XPath (in this case div[7] takes the .txt-block div)

//*[@id='titleDetails']/div[7]/text()

this leads to "Budget: $237,000,000 (estimated)", but I only want the "$237,000,000" not the content of the h4 and not of the span.

0

There are 0 answers