When do perl subs pass arguments as `$_` and when is it `$_[0]`?

174 views Asked by At

I noticed that the Parallel::Loops module uses $_ as the argument to its subref. This question isn't about Parallel::Loops per-se, it is about the coderef calling convention.

This is their example, note that $_ is passed to the sub:

$pl->foreach( \@parameters, sub {
    # This sub "magically" executed in parallel forked child
    # processes
 
    # Lets just create a simple example, but this could be a
    # massive calculation that will be parallelized, so that
    # $maxProcs different processes are calculating sqrt
    # simultaneously for different values of $_ on different CPUs
    # (Do see 'Performance' / 'Properties of the loop body' below)
 
    $returnValues{$_} = sqrt($_);
});

...but I've always used $_[0] or @_ for values passed to a sub:

$a = sub { print "\$_=$_\n" ; print "\$_[0]=$_[0]\n" };
$a->(1);

# prints out
# $_=
# $_[0]=1

Notice in my example $_ didn't work, but $_[0] (ie, @_) does. What is different between the coderef in Parallel::Loops's example, and the coderef in my example?

Note that their example isn't a typo: it only works if you use $_, which is why I'm posing this question.

3

There are 3 answers

3
brian d foy On BEST ANSWER

Look in the module source when you want to know how it's doing its thing :)

In Parallel::Loops::foreach, the module sets $_ before it calls the subroutine you provided. Since this is in a forked process, it doesn't do anything to protect the value of $_. Notice the various layers of code refs in that module.

# foreach is implemented via while above
sub foreach {
    my ($self, $varRef, $arrayRef, $sub);
    if (ref $_[1] eq 'ARRAY') {
        ($self, $arrayRef, $sub) = @_;
    } else {
        # Note that this second usage is not documented (and hence not
        # supported). It isn't really useful, but this is how to use it just in
        # case:
        #
        # my $foo;
        # my %returnValues = $pl->foreach( \$foo, [ 0..9 ], sub {
        #     $foo => sqrt($foo);
        # });
        ($self, $varRef, $arrayRef, $sub) = @_;
    }
    my $i = -1;
    $self->while( sub { ++$i <= $#{$arrayRef} }, sub {
        # Setup either $varRef or $_, if no such given before calling $sub->()
        if ($varRef) {
            $$varRef = $arrayRef->[$i];
        } else {
            $_ = $arrayRef->[$i];
        }
        $sub->();
    });
}

You could do that same thing. Every time you want to run your code ref, set the value of $_ first:

for_all( [ qw(1 3 7) ], sub { print "$_\n" } );

sub for_all {
    my( $array, $sub ) = @_;

    foreach my $value ( @$array ) {
        local $_ = $value;
        $sub->();
        }
    }

Take a look at Mojo::Collection, for example, that does this with its code refs. List::Util has several utilities that do this too.

1
ikegami On

Notice in my example $_ didn't work

Because you never set $_, unlike the following snippets:

local $_ = 1;
$a->();
$a->() for 1;

Note that $_ is a "super global", meaning $_ refers to $::_ aka $main::_ no matter what the current namespace (as set by package) is. So this works even if the sub and the sub's callee are associated with different packages.

4
Bork On

$_ and $_[0] are two different variables.

$_ is called "The default input and pattern-searching space" in perldoc perlvar, whereas$_[0] refers to the first element in the @_ array. In Perl, the namespace is not shared between different variable types. I.e. $var has nothing to do with $var_[0] or @var.

@_ is used inside a subroutine for the arguments passed to the subroutine. $_ is not used by a subroutine (unless you specifically tell it to). However, in a lot of functions, $_ is the default variable used, and that is what this module is trying to simulate. For example:

for (...) {      # aliases loop elements to $_
    print;       # prints $_
}

Now, I'm not 100% sure of all the implications of your example module code, but it seems to be assigning to $_. This could just be any variable name, such as $foo, but they chose $_.

I think one should avoid using $_ in package subs, since it may contaminate $_ as a global variable, leading to errors that are hard to diagnose.