Slice a PowerShell array into groups of smaller arrays

8.2k views Asked by At

I would like to convert a single array into a group of smaller arrays, based on a variable. So, 0,1,2,3,4,5,6,7,8,9 would become 0,1,2,3,4,5,6,7,8,9 when the size is 3.

My current approach:

$ids=@(0,1,2,3,4,5,6,7,8,9)
$size=3

0..[math]::Round($ids.count/$size) | % { 

    # slice first elements
    $x = $ids[0..($size-1)]

    # redefine array w/ remaining values
    $ids = $ids[$size..$ids.Length]

    # return elements (as an array, which isn't happening)
    $x

} | % { "IDS: $($_ -Join ",")" }

Produces:

IDS: 0
IDS: 1
IDS: 2
IDS: 3
IDS: 4
IDS: 5
IDS: 6
IDS: 7
IDS: 8
IDS: 9

I would like it to be:

IDS: 0,1,2
IDS: 3,4,5
IDS: 6,7,8
IDS: 9

What am I missing?

6

There are 6 answers

0
Bill_Stewart On BEST ANSWER

You can use ,$x instead of just $x.

The about_Operators section in the documentation has this:

, Comma operator                                                  
   As a binary operator, the comma creates an array. As a unary
   operator, the comma creates an array with one member. Place the
   comma before the member.
0
ChiliYago On
cls
$ids=@(0,1,2,3,4,5,6,7,8,9)
$size=3

<# 
Manual Selection:
    $ids | Select-Object -First 3 -Skip 0
    $ids | Select-Object -First 3 -Skip 3
    $ids | Select-Object -First 3 -Skip 6
    $ids | Select-Object -First 3 -Skip 9
#>

# Select via looping
$idx = 0
while ($($size * $idx) -lt $ids.Length){

    $group = $ids | Select-Object -First $size -skip ($size * $idx)
    $group -join ","
    $idx ++
} 
0
craig On

For the sake of completeness:

function Slice-Array
{

    [CmdletBinding()]
    param (
        [Parameter(Mandatory=$true, Position=0, ValueFromPipeline=$True)]
        [String[]]$Item,
        [int]$Size=10
    )
    BEGIN { $Items=@()}
    PROCESS {
        foreach ($i in $Item ) { $Items += $i }
    }
    END {
        0..[math]::Floor($Items.count/$Size) | ForEach-Object { 
            $x, $Items = $Items[0..($Size-1)], $Items[$Size..$Items.Length]; ,$x
        } 
    }
}

Usage:

@(0,1,2,3,4,5,6,7,8,9) | Slice-Array -Size 3 | ForEach-Object { "IDs: $($_ -Join ",")" }
0
mklement0 On

To add an explanation to Bill Stewart's effective solution:

Outputting a collection such as an array[1] either implicitly or using return sends its elements individually through the pipeline; that is, the collection is enumerated (unrolled):

# Count objects received.
PS> (1..3 | Measure-Object).Count
3   # Array elements were sent *individually* through the pipeline.

Using the unary form of , (comma; the array-construction operator) to prevent enumeration is a conveniently concise, though somewhat obscure workaround:

PS> (, (1..3) | Measure-Object).Count 
1   # By wrapping the array in a helper array, the original array was preserved.

That is, , <collection> creates a transient single-element helper array around the original collection so that the enumeration is only applied to the helper array, outputting the enclosed original collection as-is, as a single object.

A conceptually clearer, but more verbose and slower approach is to use Write-Output -NoEnumerate, which clearly signals the intent to output a collection as a single object.

PS> (Write-Output -NoEnumerate (1..3) | Measure-Object).Count 
1   # Write-Output -NoEnumerate prevented enumeration.

Pitfall with respect to visual inspection:

On outputting for display, the boundaries between multiple arrays are seemingly erased again:

PS> (1..2), (3..4) # Output two arrays without enumeration
1
2
3
4

That is, even though two 2-element arrays were each sent as a single object each, the output, through showing elements each on their own line, makes it look like a flat 4-element array was received.

A simple way around that is to stringify each array, which turns each array into a string containing a space-separated list of its elements.

PS> (1..2), (3..4) | ForEach-Object { "$_" }
1 2
3 4

Now it is obvious that two separate arrays were received.


[1] What data types are enumerated:
Instances of data types that implement the IEnumerable interface are automatically enumerated, but there are exceptions:
Types that also implement IDictionary, such as hashtables, are not enumerated, and neither are XmlNode instances.
Conversely, instances of DataTable (which doesn't implement IEnumerable) are enumerated (as the elements of their .Rows collection) - see this answer and the source code.
Additionally, note that stdout output from external program is enumerated line by line.

0
mklement0 On

Craig himself has conveniently wrapped the splitting (partitioning) functionality in a robust function:

Let me offer a better-performing evolution of it (PSv3+ syntax, renamed to Split-Array), which:

  • more efficiently collects the input objects using an extensible System.Collections.Generic.List[object]] collection.

  • doesn't modify the collection during splitting, and instead extracts ranges of elements from it.

function Split-Array {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory, ValueFromPipeline)]
        [String[]] $InputObject
        ,
        [ValidateRange(1, [int]::MaxValue)]
        [int] $Size = 10
    )
    begin   { $items = New-Object System.Collections.Generic.List[object] }
    process { $items.AddRange($InputObject) }
    end {
      $chunkCount = [Math]::Floor($items.Count / $Size)
      foreach ($chunkNdx in 0..($chunkCount-1)) {
        , $items.GetRange($chunkNdx * $Size, $Size).ToArray()
      }
      if ($chunkCount * $Size -lt $items.Count) {
        , $items.GetRange($chunkCount * $Size, $items.Count - $chunkCount * $Size).ToArray()
      }
    }
}

With small input collections, the optimization won't matter much, but once you get into the thousands of elements, the speed-up can be dramatic:

To give a rough sense of the performance improvement, using Time-Command:

$ids = 0..1e4 # 10,000 numbers
$size = 3 # chunk size

Time-Command { $ids | Split-Array -size $size }, # optimized
             { $ids | Slice-Array -size $size }  # original

Sample result from a single-core Windows 10 VM with Windows 5.1 (the absolute times aren't important, but the factors are):

Command                        Secs (10-run avg.) TimeSpan         Factor
-------                        ------------------ --------         ------
$ids | Split-Array -size $size 0.150              00:00:00.1498207 1.00
$ids | Slice-Array -size $size 10.382             00:00:10.3820590 69.30

Note how the unoptimized function was almost 70 times slower.

0
iRon On

As PowerShell "arrays" usually unroll in the [pipeline] and a using the pipeline (https://learn.microsoft.com/powershell/module/microsoft.powershell.core/about/about_pipelines) has a memory usages advantage (as each item is processed separately), I would change the question to:

Slice a PowerShell pipeline into smaller batches

I have created a small Create-Batch function for this:

Install-Script -Name Create-Batch

Example 1*

1..5 |Create-Batch -Size 2 |ForEach-Object { "$_" }
1 2
3 4
5

Example 2*

Get-Process |Create-Batch |Set-Content .\Process.txt

This creates a single batch (array) containing all the itams The result of this statement is the same as: Get-Process |Set-Content .\Process.txt but note that this appears (for yet unknown reason) about twice as fast.
See: #8270 Suggestion: Add a chunking (partitioning, batching) mechanism to Select-Object, analogous to Get-Content -ReadCount