xml same tag name on different levels

571 views Asked by At

I am trying to parse an xml document using Perl XML::LibXML::reader module . The module works well and i was able to parse most of the document ,but there are sections on the xml which could have multiple elements with the same name on different levels and i didn't know how to apporach and deal with such elements ,

what I am trying to do is to convert the below strcutre to conveinent perl data structure i tried to use XML::Simple and XML::Twig smplify subroutine (see below )in addition to XML::LibXML but parsing this section with them is very slow(x20 slower than parsing the document without them),

 my @conf= eval{($copy->findnodes('criteria'))};
 my $t= XML::Twig->new();
  my $hash=$t->parse($_->toString)->simplify(forcearray =>1  ]);
  $t->purge();

could someone suggest how can i parse the below section to convient perl data structre with XML::LibXML::reader in faster manner. any help with be appreciated

 example of such file :

 <criteria operator="OR">

    <criteria operator="AND">  -> nested element
    <criterion test_ref="oval:org.mitre.oval:tst:123" comment="Windows XP is installed"/>
    <criterion test_ref="oval:org.mitre.oval:tst:234" comment="file foo.txt exists"/>
    <criteria operator="OR"> -> nested element
    <criterion test_ref="oval:org.mitre.oval:tst:127" comment="file x.txt exists"/>
    <criterion test_ref="oval:org.mitre.oval:tst:127" comment="file y.txt exists"/>
    </criteria> 
     </criteria> 
    <criteria operator="AND" negate="true"> ->nested element
    <criterion test_ref="oval:org.mitre.oval:tst:345" comment="Windows 2003 is installed"/>
    <criterion test_ref="oval:org.mitre.oval:tst:456" comment="file fred.txt has a version less than 2"/>
    <criterion test_ref="oval:org.mitre.oval:tst:567" negate="true" comment=patch is installed"/>
    </criteria>
    <criterion test_ref="oval:org.mitre.oval:tst:345" comment="Windows 2003 is installed"/>
 </criteria>
1

There are 1 answers

0
Sobrique On

I'm not sure you want to simplify your XML. Looking at it, the tool you're looking for is a twig handler in XML::Twig.

E.g.:

#!/usr/bin/perl

use strict;
use warnings;
use XML::Twig;
use Data::Dumper;

my $xml = q{ <criteria operator="OR">

    <criteria operator="AND">  -> nested element
    <criterion test_ref="oval:org.mitre.oval:tst:123" comment="Windows XP is installed"/>
    <criterion test_ref="oval:org.mitre.oval:tst:234" comment="file foo.txt exists"/>
    <criteria operator="OR"> -> nested element
    <criterion test_ref="oval:org.mitre.oval:tst:127" comment="file x.txt exists"/>
    <criterion test_ref="oval:org.mitre.oval:tst:127" comment="file y.txt exists"/>
    </criteria> 
     </criteria> 
    <criteria operator="AND" negate="true"> ->nested element
    <criterion test_ref="oval:org.mitre.oval:tst:345" comment="Windows 2003 is installed"/>
    <criterion test_ref="oval:org.mitre.oval:tst:456" comment="file fred.txt has a version less than 2"/>
    <criterion test_ref="oval:org.mitre.oval:tst:567" negate="true" comment="patch is installed"/>
    </criteria>
    <criterion test_ref="oval:org.mitre.oval:tst:345" comment="Windows 2003 is installed"/>
 </criteria> };

my %test_hash;

sub process_criteria {
    my ( $twig, $criteria ) = @_;
    foreach my $criterion ( $criteria->children('criterion') ) {
        my $ref     = $criterion->att('test_ref');
        my $comment = $criterion->att('comment');
        $test_hash{$ref} = $comment;
    }
}

my $twig =
    XML::Twig->new( twig_handlers => { criteria => \&process_criteria } )
    ->parse($xml);

print Dumper \%test_hash;

Now, I'm not sure this does exactly what you want, but is meant as more of an illustration of how to handle the problem.