Clear content in xml brackets in all files in directory tree on Windows using Strawberry Perl and twig

215 views Asked by At

I want to clear whole content that is placed inside of <loot> </loot> elements in XML files in a directory tree. I am using Strawberry Perl for windows 64 bit.

For example this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3"/>
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>

The changed file should look:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
</loot>

I have this code:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
      $loot_entry -> delete;
   }
   $twig -> flush;
}

my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                              twig_handlers => { 'loot' => \&delete_loot } ); 

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( 'C:\Users\PIO\Documents\serv\monsters' ) ) {

    print "Processing $file\n";
    $twig -> parsefile_inplace($file); 
}

But it edits correctly only the first file it meets and the rest files leaves clear (0 kb clear files)

2

There are 2 answers

3
David Verdin On BEST ANSWER

The XML::Twig doc says that "Multiple twigs are not well supported".

If you look at the state of the twig object (using Data::Dumper for example) you see a strong difference between the first and subsequent runs. It looks like it considers that is has been totally flushed already (which is true, as there was a complete flush during the first run). It probably has nothing more to print for the subsequent files and the file ends up empty.

Recreating the twig object at each loop worked for me:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
        $loot_entry -> delete;
    }
}

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( '/home/dabi/tmp' ) ) {

    print "Processing $file\n";
    my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                                  twig_handlers => { loot => \&delete_loot, } ); 
    $twig -> parsefile($file); 
    $twig -> print_to_file($file);
}

Also, I had to change the XML file structure to have it processed:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon">
<health value="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3">
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>
</monster>
4
zdim On

Note   With flush changed to print the code in the question works for me (with valid XML).

However, I still recommend either of versions below. Tested with two groups of valid XML files.


When XML::Twig->new(...) is set first and then files looped over and processed, I get the same behavior. The first file is processed correctly, the others completely blanked.   Edit When flush is replaced by print the shown code in fact works (with correct XML files). However I still suggest either of versions below instead, as XML::Twig just does not support multiple files well.

The reason may have something to do with new being a class method. However, I don't see why this needs to affect handling of multiple files. The callback is installed outside of the loop, but I've tested with it being re-installed for each file and it doesn't help.

Finally, flush-ing isn't needed while it clearly hurts here, by clearing the state (which was created by the class method new). This doesn't affect the code below, but it is still replaced by print.

Then just do everything in the loop. A simple version

use strict;
use warnings;
use File::Find::Rule;
use XML::Twig;

my @files = File::Find::Rule->file->name('*.xml')->in('...');

foreach my $file (@files)
{
    print "Processing $file\n";
    my $t = XML::Twig->new( 
        pretty_print => 'indented', 
        twig_handlers => { loot => \&clear_elt },
    );
    $t->parsefile_inplace($file)->print;
}

sub clear_elt {
    my ($t, $elt) = @_; 
    my $elt_name = $elt->name;                # get the name
    my $parent = $elt->parent;                # fetch the parent
    $elt->delete;                             # remove altogether
    $parent->insert_new_elt($elt_name, '');   # add it back empty
}

The callback code is simplified, to remove the element altogether and then add it back, empty. Note that the sub does not need the element name hardcoded. This can thus be used as it stands to remove any element.

We can avoid calling new in the loop by using another class method, nparse.

my $t = XML::Twig->new( pretty_print => 'indented' );

foreach my $file (@files) 
{
    print "Processing $file\n";
    my $tobj = XML::Twig->nparse( 
        twig_handlers => { loot => \&clear_elt }, 
        $file
     );
     $tobj->parsefile_inplace($file)->print;
}

# the sub clear_elt() same as above

We do have to first call the new constructor, even as it isn't directly used in the loop.


Note that calling new before the loop without twig_handlers and then setting handlers inside

$t->setTwigHandlers(loot => sub { ... });

does not help. We still only get the first file processed correctly.