Use COM Object inside ForEach-Object -Parallel

109 views Asked by At

I wrote a script to convert over 1000 Excel files to PDF. In the past, I used the command line interface of LibreOffice to achieve it. Now, I am trying to use Microsoft Excel and I am hoping to parallelize the process with ForEach-Object -Parallel. My issue is that the COM Object cannot be accessed inside the parallel loop despite me employing $using:. Is it possible (at all) to use a COM Object in a parallel loop? Here is my code:

# Get working directory
$wd = pwd

# Output directory for converted PDFs
$output_dir = "$wd\data\sample_curriculums_fall2023\pdf"

# Excel fixed format (important to specify the conversion target (i.e. PDF))
$ExcelFixedFormat = “Microsoft.Office.Interop.Excel.xlFixedFormatType” -as [type]

# Get Excel files
$excel_files = Get-ChildItem -Path "$wd\data\sample_curriculums_fall2023\excel\" -Filter *xlsm

# Create COM Object for Excel and make it invisible (i.e. headless)
$ExcelObject = New-Object -ComObject Excel.Application
$ExcelObject.Visible = $false

$excel_files | ForEach-Object -ThrottleLimit 20 -Parallel {

        $file = $_
        $obj = $using:ExcelObject
        $fm = $using:ExcelFixedFormat

        # Make name for PDF output
        $output = Join-Path -Path $using:output_dir -ChildPath ($file.BaseName + ".pdf")
    
        # Open Excel file to convert
        $workbook = $obj.Workbooks.Open($file.FullName, 3) # PROBLEM!!!!
        $workbook.Saved = $true
        $workbook.ExportAsFixedFormat($fm::xlTypePDF, $output)
        $obj.Workbooks.Close()
    
}

$ExcelObject.Quit()

The error is thrown at this line inside the parallel loop:

$workbook = $obj.Workbooks.Open($file.FullName, 3)

I get the following error:

You cannot call a method on a null-valued expression.

Which suggests that the $obj variable does not contain the COM Object and is null. Note that the script works perfectly with a regular foreach function.

Thank you in advance for your help.

1

There are 1 answers

1
mklement0 On BEST ANSWER
  • The Microsoft Office COM Automation servers aren't thread-safe, so you cannot share a given instance between multiple threads.

    • See Threading support in Office for background information and ways to explicitly manage threads, which probably is either infeasible or impractical from PowerShell.
  • Creating a separate instance with New-Object -ComObject Excel.Application for each file to convert isn't worth doing, because each such call creates a new Excel child process, which is costly both in terms of performance and memory use.

However, you can try to batch your input files, so that you only create a limited number of Excel processes that each process a batch of input files:

# How many Excel instances to run in parallel.
# Tweak this number based on your system's CPU count and memory.
$throttleLimit = 4

# Collect all input files.
$files = 1..100
  # Get-ChildItem "$wd\data\sample_curriculums_fall2023\excel\" -Filter *xlsm

# Determine how many files to pass to each Excel instance.
$chunkSize = [Math]::Ceiling($files.Count / $throttleLimit)

# Batch the input files and process each batch with ForEach-Object -Parallel
$files |
  ForEach-Object `
    -Begin { $i = 0; $chunk = [System.Collections.Generic.List[object]]::new($chunkSize) } `
    -Process { 
      $chunk.Add($_)
      if (++$i -eq $chunkSize) {
        , $chunk.ToArray()
        $i = 0; $chunk.Clear()
      }
    } `
    -End {
      if ($i) {
        , $chunk.ToArray
      }
    } |
  ForEach-Object -ThrottleLimit $throttleLimit -Parallel {
    $xl = New-Object -ComObject Excel.Application
    foreach ($file in $_) {
      $output = Join-Path $using:output_dir ($file.BaseName + '.pdf')
      $workbook = $xl.Workbooks.Open($file.FullName, 3)
      $workbook.Saved = $true
      $workbook.ExportAsFixedFormat(0, $output) # 0 = [Microsoft.Office.Interop.Excel.xlFixedFormatType]::xlTypePDF
      $xl.Workbooks.Close()
    }
    $xl.Quit()
  }

Note the use of an auxiliary ForEach-Object call to batch (chunk) the array of files to process.
Potentially building this functionality into PowerShell itself is the subject of GitHub issue #8270.