PowerShell Make ForEach Loop Parallel

686 views Asked by At

This is working code:

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
foreach ($id in $ids)
{ 
   $uriStr      = "http://192.168." + [String]$id + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$id] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$status

I would like to execute the ForEach loop in parallel to explore performance improvements.

First thing I tried (turned out naive) is to simply introduce the -parallel parameter

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
foreach -parallel ($id in $ids)
{ 
   $uriStr      = "http://192.168." + [String]$id + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$id] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$status

This results in the following error, suggesting this feature is still under consideration of development as of Powershell 7.3.9:

ParserError: 
Line |
   3 |  foreach -parallel ($id in $ids)
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The foreach '-parallel' parameter is reserved for future use.

I say naive because the documentation says the parallel parameter is only valid in a workflow. However, when I try it I get an error saying workflow is no longer supported.

workflow helloworld {Write-Host "Hello World"}
ParserError: 
Line |
   1 |  workflow helloworld {Write-Host "Hello World"}
     |  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Workflow is not supported in PowerShell 6+.

Then I tried various combinations from various references (Good Example), which advise about ForEach being fundamentally different from from ForEach-Object, which supports parallel, like so (basically piping the ids in):

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
$ids | ForEach-Object -Parallel 
{ 
   $uriStr      = "http://192.168." + [String]$_ + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$status

This generates the following error:

ForEach-Object: 
Line |
   3 |  $ids | foreach-object -parallel
     |                        ~~~~~~~~~
     | Missing an argument for parameter 'Parallel'. Specify a parameter of type
     | 'System.Management.Automation.ScriptBlock' and try again.

   $uriStr      = "http://192.168." + [String]$_ + ".51/status"
   $uri         = [System.Uri] $uriStr
   $status[$i_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}

But, after trying various script block semantics, here is the best I could do (basically apply :using to status variable that is outside the script block):

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
$myScriptBlock = 
{ 
   $uriStr      = "http://192.168." + [String]$_ + ".51/status"
   $uri         = [System.Uri] $uriStr
   {$using:status}[$_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
}
$ids | foreach-object -parallel $myScriptBlock
$status

Error, again: Unable to index into Scriptblock

Line |
   4 |  … ng:status}[$_] = try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
     |                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Unable to index into an object of type "System.Management.Automation.ScriptBlock".
InvalidOperation: 

There are couple of other worthy to mention errors - if not applying the :using qualifier, get error

"cannot index into null array"

this basically means the $status variable is unrecognizable in the foreach or script block.

All other ways to express the :using qualifier are rejected with errors like

"assignment expression invalid" "use {}..."

so have been omitted for brevity and, better flow in problem statement. Lastly, here is a reference on SciptBlocks, for Powershell 7.3+ which have also been considered without much progress.

4

There are 4 answers

1
mklement0 On BEST ANSWER

The following should work as intended (see the NOTE source-code comment below):

$ids = 1..9 
$status  = [PSCustomObject[]]::new(10)
$ids | ForEach-Object -Parallel {  # NOTE: Opening { MUST be here.
   $uri = [System.Uri] "http://192.168.$_.51/status"
   # NOTE: (...) is required around the $using: reference.
   ($using:status)[$_] = try { Invoke-RestMethod -Uri $uri -TimeOut 30 } catch {}
}
$status

Note: Since $_ is used as the array index ([$_]), the results for your 9 input IDs are stored in the array elements starting with the second one (whose index is 1), meaning that $status[0] will remain $null. Perhaps you meant to use 0..9.

  • You're using PowerShell (Core) 7+, in which PowerShell workflows aren't supported anymore; therefore, the foreach statement doesn't support -parallel there.

  • However, PowerShell 7+ does support -Parallel as a parameter of the ForEach-Object cmdlet[1] for multi-threaded execution.

    • As without -Parallel (i.e. with the (often positionally bound) -Process parameter), the script block ({ ... } you pass as an argument to the cmdlet does not use a self-chosen iterator variable the way that you do in a foreach statement (foreach ($id in $ids) ...), but rather receives its input from the pipeline and uses the automatic $_ variable to refer to the input object at hand, as shown above.

    • Because the ForEach-Object cmdlet is a type of command - as opposed to a language statement such as foreach (or an expression such as 'foo'.Length) - it is parsed in argument (parsing) mode:

      • A command must be specified on a single line, EXCEPT if:

        • explicitly indicated otherwise with a line continuation (placing a ` (the so-called backtick) at the very end of the line)

        • or the line is unambiguously syntactically incomplete and forces PowerShell to keep parsing for the end of the command on the next line.

      • Language statements (e.g., foreach and if) and expressions (e.g. .NET method calls), which are parsed in expression (parsing) mode, are generally not subject to this constraint.[2]

      • With a script-block argument, you can make a command multiline by using the syntactically-incomplete technique:

        • Placing its opening { only on the first line, allows you to place the block's content on subsequent lines, as shown above.
        • Note that the content of a script block is a new parsing context, in which the above rules apply again.
  • In order to apply an operation to a $using: reference (which accesses the value of a variable from the caller's scope) that sets a property or element identified by an index ([$_]), or gets a property value using an expression or an element using a non-literal index, or a method call, the $using: reference must be enclosed in (...), the grouping operator.

    • Arguably, this shouldn't be necessary, but is as of PowerShell 7.3.9 - see GitHub issue #10876 for a discussion.

    • As for your {$using:status}[$_] attempt: the {...} enclosure created a script block, which doesn't make sense here;[3] perhaps you meant to delimit the identifier part of the $using: reference, in which case the {...} enclosure goes after the $: ${using:status}; however, that (a) isn't necessary here, and (b) doesn't help the problem - (...) around the entire reference is needed either way.

  • A note on thread safety:

    • Because you're using an array to store your results, and because arrays are fixed-size data structures and you make each thread (runspace) target a dedicated element of your array, there is no need to manage concurrent access explicitly.

    • More typically, however, with variable-size data structures and/or in cases where multiple threads may access the same element, managing concurrency is necessary.

    • An alternative to filling a data structure provided by the caller is to simply make the script block output results, which the caller can collect; however, unless this output also identifies the corresponding input object, this correspondence is then lost.

    • This answer elaborates on the last two points (thread-safe data structures vs. outputting results).


[1] Somewhat confusingly, ForEach-Object has an alias also named foreach. It is the syntactic context (the parsing mode) that determines in a given statement whether foreach refers to the foreach (language) statement or the ForEach-Object cmdlet; e.g. foreach ($i in 1..3) { $i } (statement) vs. 1..3 | foreach { $_ } (cmdlet).

[2] However, if an expression is syntactically complete on a given line, PowerShell also stops parsing, which amounts to a notable pitfall with ., the member-access operator: Unlike in C#, for instance, . must be placed on the same line as the object / expression it is applied to. E.g. 'foo'.<newline> Length works, but 'foo'<newline> .Length does not. Additionally, the . must immediately follow the target object / expression even on a single line (e.g. 'foo' .Length breaks too)

[3] Due to PowerShell's unified handling of list-like collections and scalars (single objects) - see this answer - indexing into a script block technically works with getting a value: indices [0] and [-1] return the scalar itself (e.g. $var = 42; $var[0]), all other indices return $null by default, but cause an error if Set-StrictMode -Version 3 or higher is in effect; however, an attempt to assign a value categorically fails (e.g. $var = 42; $var[0] = 43)

1
Net Dawg On

Answering my own question, but it is thanks only to minimalist brilliance of @js2010 and the amazing post by @Hugo (which I will keep as the accepted answer), that I could even fathom this.

$ids    = 0..9 | Get-Random -Shuffle
$status = $ids | ForEach-Object -Parallel {
   $uri = [System.Uri] "http://192.168.$_.51/status"
   try {Invoke-RestMethod -Uri $uri -TimeOut 30}catch{}
} 
$status

Again, please first thoroughly read @Hugo first and then @js2010 as well.

2
Hugo On

I do not know about the parallel parameter of ForEach, but I do know you can use Jobs for parallel network requests, you could use the below example:

# Name of your jobs.
$JobName = "-StatusChecker"

# Holds the code we want to run in parallel.
$ScriptBlock = {
  param (
    $id
  )

  $uriStr      = "http://192.168." + [String]$id + ".51/status"
  $uri         = [System.Uri] $uriStr
  $response = try {
    Invoke-RestMethod -Uri $uri -TimeOut 30
  } catch {
    # Return a message for our results. Doesn't matter what you return but if it's null it will error.
    "Failed to grab status of ID: $id"
    # Write the error to the error stream but do not print it.
    Write-Error $_ -ErrorAction SilentlyContinue
  }

  # The $Error variable contains a list of errors that occurred during the run.
  # By returning it, we get the opportunity to revise what went wrong in the job.
  return $Error, $response, $id
}

# Grab all remaining jobs from the last time this script was run, stop and remove them.
# If you don't do this then it will mess up your results for each session as they aren't removed.
# We identify the relevant jobs by the JobName parameter set with Start-Job.
Get-Job -Name "*$JobName" | Stop-Job
Get-Job -Name "*$JobName" | Remove-Job


$ids = 1..9 
# Iterate through each id and create a job for each one.
foreach ($id in $ids) {

  # The job runs in parallel.
  Start-Job -ScriptBlock $ScriptBlock -ArgumentList @($id) -Name "ID-$ID-$JobName"
}

# Wait here until all jobs are complete.
$Jobs = Get-Job -Name "*$JobName" | Wait-Job

# Hold our results.
$status  = [PSCustomObject[]]::new(10)
# Grab the results of each job and format them into a nice table.
Foreach ($JobResult in $Jobs) {
  $Results = Receive-Job -Job $JobResult
  # $Results[0] is the error array returned by the job.
  # $Results[1] is $response from RestMethod.
  # $Results[2] is the $id.

  # Add returns to status list
  $Status[$Results[2]] = $Results[1]

  # Print each error found to the console.
  Foreach ($Err in $Results[0]) {
    Write-Error "Failed job for $($JobResult.Name). Error Message: $($Err)" -ErrorAction Continue
  }
}

# Final results.
$Status

Your code is inside the $ScriptBlock variable, most of the code below that is about retrieving the results from each job and processing them.

1
js2010 On

This example works for me. Arrays start from 0. The curly brace needs to be on the same line after -parallel.

$ids = 0..9 
$status  = [PSCustomObject[]]::new(10)
$ids | foreach-object -parallel {
   $id = $_
   $mystatus = $using:status
   $mystatus[$id] = $id  # or  ($using:status) = $id
}
$status

0
1
2
3
4
5
6
7
8
9

Alternatively just save the output, and not have to worry about it being thread safe:

$ids = 0..9 
$status = $ids | foreach-object -parallel {
  $id = $_
  $id
}
$status