XPath nodes text joined by br

241 views Asked by At

How to join text nodes between br tags again by br.

Here is the xml code

<div>
    text1.
    <br>
    text2.
    <br>
    text3.
    <div>ad sense code</div>
    <br>
    text4.
    <div>ad sense code</div>
    <br>
    textxx.
    <br>
</div>

I need to get all text node text2 to textxx joined by br tag or \n\n.

I can get all the text but joined without any separator using //div/text()[position()>1] but the result like this:

text1.text2.text3.text4.textxx.

while I want it like this:

text1.<br>text2.<br>text3.<br>text4.<br>textxx.<br>

Simply I need to keep the br tags. I am using Perl HTML::TreeBuilder::LibXML module.

2

There are 2 answers

0
Michael Kay On

XPath can be used (a) to select nodes from the input document, or (b) to compute atomic values such as strings, booleans, or numbers from the nodes in the input document. It can never [with very edge-case exceptions] return nodes that weren't present in the input.

It's not entirely clear what you mean by your desired output of

text1.<br>text2.<br>text3.<br>text4.<br>textxx.<br> 

Are you looking for this as a string? Or a sequence of text nodes and element nodes, interspersed?

Returning it as a string is possible in XPath 3.1 using the serialize() function, but in Perl you only have access to the venerable and limited XPath 1.0.

Returning it as a set of nodes isn't possible because the nodes aren't there in the source: the source contains text nodes that have values such as "__text1__" where underscores represent whitespace, and your desired output drops the whitespace.

You appear to be doing a transformation rather than merely a selection, so you are out of XPath territory and into XSLT.

0
daliaessam On

The solution I was able to do what I want in Perl is like this:

$text = "";
$tree = HTML::TreeBuilder::LibXML->new_from_content($content);
foreach my $node ($tree->findnodes("./div/text()[position()>1]")) {
    $text .= $node->findvalue('string(.)') . "<br>";
}
$text =~ s/<br>$//g;