Hi I wrote the following code in PowerShell version 2. I am leveraging C#.NET code to query Windows search for PDF files in a specific folder entered by user, against the values in Excel file. If the value is found in one of the PDF files, it outputs the PDF filename to the excel, next to the key.
SerarchItemsInPDFs.ps1:
# Starting C# code
$Assem = (
"System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"
)
$Source = @"
using System.Collections.Generic;
using System.Data.OleDb;
namespace Windows.Search
{
public class Tool2
{
public static List<string> Get(string Key, string Path)
{
using (OleDbConnection conn = new OleDbConnection("Provider=Search.CollatorDSO;Extended Properties='Application=Windows';"))
{
conn.Open();
OleDbCommand cmd = new OleDbCommand("SELECT System.ItemPathDisplay FROM SYSTEMINDEX WHERE FREETEXT('" + Key + "') AND DIRECTORY = '" + Path + "'", conn);
using (OleDbDataReader reader = cmd.ExecuteReader())
{
List<string> row = new List<string>();
while (reader.Read())
{
for (int i = 0; i < reader.FieldCount; i++)
{
row.Add(reader[i].ToString());
}
}
return row;
}
}
}
}
}
"@
Add-Type -ReferencedAssemblies $Assem -TypeDefinition $Source -Language CSharp
# End of C# code
# PoswerShell script begins
function IsData($variable) {if ($variable) {$true} else {$false}}
$ExcelPath = Read-Host "Enter the path to excel file"
$SheetName = "New Database"
$Excel = New-Object -ComObject "Excel.Application"
$Workbook = $Excel.workbooks.open($ExcelPath)
$Sheet = $Workbook.Worksheets.Item($SheetName)
$PDFsPath = Read-Host "Enter the path to PDFs folder"
$TotalUsedRows = $Sheet.usedRange.SpecialCells(11).row
"Starting rescue log file" | out-File "C:\Temp\output.txt"
for ($i=2; $i -le $TotalUsedRows; $i++)
{
$CellValue = $Sheet.Cells.Item($i,1).Text
If(!(IsData($CellValue))){continue}
$PDFsNames = [Windows.Search.Tool2]::Get($CellValue, $PDFsPath)
$records_found = 2
Write-Host $CellValue
foreach ($PDFName in $PDFsNames)
{
Write-Host $PDFName
$Sheet.Cells.Item($i, $records_found) = $PDFName
$CellValue + ";" + $PDFName + "," | out-File -Append "C:\Temp\output.txt"
$records_found++
}
}
$Workbook.SaveAs(“c:\Temp\results.xlsx”)
Write-Host "Job Done"
# End of PowerShell script
Since, lot of new cmdlets has been introduced in PowerShell since version 2.0, is there a way to get rid of the C# code part using POSH v4?
Can we bring more syntax optimizations, given we are using all edge versions of product (Windows 8.1, PowerShell 4 and Office 2013)?
Yes, you can read from a database without using C# in PowerShell even in v1. Since PowerShell can use any .Net class you can do exact the same code in PowerShell. Use
New-Object
in PowerShell instead ofnew
in C# and the rest of the code is basically the same. Some examples are here.Maybe lose the not needed IsData function. The code would probably be a bit faster if you first build the output and then at the end write the output instead of opening the file each time. I don't see any other optimizations you could use in PowerShell v4.
update
Here is an example of using windows search all in powershell.