Perl uninitialized value when using alternation in regex

145 views Asked by At

I have a for loop with an if statement that looks like this:

   for (my $i=0; $i < $size; $i++) {
       if ($array[$i] =~ m/_(B|P|BC|PM)/) {
           #Remove from @array
           splice(@array, $i, 1);
           next;
       }
       #Get rid of numbers at the end
       $array[$i] =~ s/_[0-9]+//;
   }

I am getting an error saying that says "Use of uninitialized value within @array in pattern match...." on the line with the if statement.

When I remove the alternation from the regex on that line, the error goes away. If I comment out the whole if statement, the regex under the comment "#Get rid of numbers at the end" does not produce any errors.

I've printed out all values of @array and everything looks fine. I've tried no parentheses and brackets instead of the parentheses in the expression with no change. Any ideas what could be causing this?

2

There are 2 answers

0
mob On BEST ANSWER

Here is a simple demonstration of the same problem.

1: @array = (1,2);
2: $size = 2;
3: for ($i=0; $i<$size; $i++) {
4:    if ($array[$i] == 1) {
5:        splice @array, $i, 1;
6:    }
7: }

So what happens when you execute this code? At line 5, you remove the first element of the array, so the array becomes (2). At the end of the first for-loop iteration, you increment $i (from 0 to 1), compare it to $size (which is still 2), and decide to continue the loop.

Then you are at line 4 again. You are performing an operation on $array[1]. But @array only has one element, $array[1] is not defined, and Perl gives you a warning.

It is important to be careful if you are a modifying a data structure at the same time that you are iterating through it.

--

Consider this alternate, Perlish approach to the first part of your problem:

@array = grep { !m/_(B|P|BC|PM)/ } @array

That is, identify all the elements of @array that satisfy some condition (here, the condition is not matching the pattern), and then update @array so that it only holds those good elements. zdim had another good approach.

9
zdim On

Removing elements from an array is in principle expensive, even though splice optimizations help. Thanks to ysth for comments. More to the point, working correctly through those indices requires a lot of care, as exposed and dissected in mob's answer. Here is another way

my @new_array = 
    map { 
        s/_[0-9]+//;        #/ cleanup from the last statement in loop
        $_                  # return this element, not return of s/../../
    }
    grep { defined && !/_(B|P|BC|PM)/ }  # remove elements
    @array;

First grep makes sure to skip undef elements, then filters what you need. Its output list is passed as input to map, which makes the change from the last line of your loop to each element.

If you don't care about the old array just assign to @array instead of making a @new_array.

Starting from 5.14.0 we can use the non-destructive /r modifier in the substituion, which returns the changed string and leaves the original unchanged. This is a perfect use case for it

@array = map { s/_[0-9]+//r } grep { defined && !/_(B|P|BC|PM)/ } @array;

where the original array is overwritten.


This processes the data twice. A more efficient version is to loop over the array and push (copy) the elements that are to be kept, suitably changed, into the new array.