XML::Twig Match handler tag using regex (perl v5.30.3 XML Twig v3.52)

186 views Asked by At

Is there a way to use a regular expression when trying to match tags (nodes) using XML::Twig handlers?

I have the following code that works when using a regex on a tags attribute but not on the actual tag itself.

Here's the code I have for the regex on the tags attribute:

use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
use Archive::Zip qw(:ERROR_CODES :CONSTANTS);

my $zipName='C:\Temp\file1.xlsx';
my $zipread=$zip->read($zipName);
my $wb_rels=$zip->contents('xl/_rels/workbook.xml.rels');

my @Array1;
my $twig=XML::Twig->new(
twig_handlers =>{q{Relationship[@Type=~/sharedStrings$/]} => sub{Twig_handler_sub(@_,\@Array1);}})->parse($wb_rels);
print Dumper \@Array1;


sub Twig_handler_sub{
    my( $t, $elt, $Array1)= @_;
    push @$Array1,$_->att('Target');
}

However, I cannot find the syntax for doing a similar thing with tags.

I have tried the following:

use strict;
use warnings;
use XML::Twig;

my $data='
<result>
    <target type="aim">
        <tag1>123</tag1>
        <tag2>456</tag2>
        <nottag>789</nottag>
    </target>
</result>';

XML::Twig->new( twig_handlers => { qr/^tag[1-3]/ => sub { print $_->tag, ": ", $_->text, "\n"; } })->parse($data);

However, this gives the error:

unrecognized expression in handler: '(?^u:^tag[1-3])'

Is there a way to specify the regex for the tag ?

1

There are 1 answers

5
Shawn On

If you install XML::XPath, you can use XML::Twig::XPath instead to get a more complete implementation of XPath (But only in the findnodes() method, not handlers). Unfortunately, XML::XPath doesn't seem to support matches() to get regular expression testing, but you can use other string functions:

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use XML::Twig::XPath;

my $data='
<result>
    <target type="aim">
        <tag1>123</tag1>
        <tag2>456</tag2>
        <nottag>789</nottag>
    </target>
</result>';

my $xml = XML::Twig::XPath->new();
$xml->parse($data);
for my $tag ($xml->findnodes('//*[starts-with(name(), "tag")]')) {
  say $tag->tag, ": ", $tag->text;
}

Alternatively, you can use the _all_ twig handler to match all tags, and do the test in that callback:

XML::Twig->new(twig_handlers => { _all_ => sub {
                                    if ($_->tag =~ /^tag[1-3]/) {
                                      say $_->tag, ": ", $_->text;
                                    }
                                  }
                                })->parse($data);

Kind of ugly and inelegant, but I couldn't figure out anything better.