With XML::Twig
using the set_text
method - there is a warning:
set_text ($string) Set the text for the element: if the element is a PCDATA, just set its text, otherwise cut all the children of the element and create a single PCDATA child for it, which holds the text.
So if I want to do something simple, like - say - changing the case of all the text in my XML::Document:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new(
'pretty_print' => 'indented_a',
'twig_handlers' => {
'_all_' => sub {
my $newtext = $_->text_only;
$newtext =~ tr/[a-z]/[A-Z]/;
$_->set_text($newtext);
}
}
);
$twig->parse( \*DATA );
$twig->print;
__DATA__
<root>
<some_content>fish
<a_subnode>morefish</a_subnode>
</some_content>
<some_more_content>cabbage</some_more_content>
</root>
This - because of set_text
replacing children - gets clobbered into:
<root></root>
But if I focus on just one (bottom level) node (e.g. a_subnode
) then it works fine.
Is there an elegant way to replace/transform text within an element without clobbering the data structure below it? I mean, I can do test on the presence of children or something similar, but ... there seems like there should be a better way of doing this. (A different library maybe?)
(And for the sake of clarity - this is my example of transliterating all the text in a document, my actual use case is rather more convoluted, but is still 'about' in place text tranformation).
I'm considering perhaps a node cut/and/paste approach (cut all children, replace text, paste all children) but that seems to be an inefficient approach.
Instead of having the handler on
_all_
, try having it only on text elements:#TEXT
, and changetext_only
totext
. It should work.update: Or use the
char_handler
option when you create the twig:char_handler => sub { uc shift },
instead of the handler.