I´m using the HtmlCleaner library to parse a html file and extract some data via its XPath function. That works mostly pretty well, but I can´t find a way to get just the text content of a node (without the content of the child nodes). As stated in a lot of basic XPath documentations, text() should give the content of a node without its children's content, but the htmlcleaner integration doesn´t seem to follow this. Is there a way to do it with htmlcleaners XPath?
UPADTE: here is an example:
my html is this page, http://www.imdb.com/title/tt0499549/?ref_=nv_sr_1 here is a snippet of the html:
<div class="txt-block">
<h4 class="inline">Budget:</h4>
$237,000,000
<span class="attribute">(estimated)</span>
</div>
this is my XPath (in this case div[7] takes the .txt-block div)
//*[@id='titleDetails']/div[7]/text()
this leads to "Budget: $237,000,000 (estimated)", but I only want the "$237,000,000" not the content of the h4 and not of the span.