I am very noob to Powershell and have small amounts of Linux bash scripting experience. I have been looking for a way to get a list of files that have Social Security Numbers on a server. I found this in my research and it performed exactly as I had wanted when testing on my home computer except for the fact that it did not return results from my work and excel test documents. Is there a way to use a PowerShell command to get results from the various office documents as well? This server is almost all Word and excel files with a few PowerPoints.
PS C:\Users\Stephen> Get-ChildItem -Path C:\Users -Recurse -Exclude *.exe, *.dll | `
Select-String "\d{3}[-| ]\d{2}[-| ]\d{4}"
Documents\SSN:1:222-33-2345
Documents\SSN:2:111-22-1234
Documents\SSN:3:111 11 1234
PS C:\Users\Stephen> Get-childitem -rec | ?{ findstr.exe /mprc:. $_.FullName } | `
select-string "[0-9]{3}[-| ][0-9]{2}[-| ][0-9]{4}"
Documents\SSN:1:222-33-2345
Documents\SSN:2:111-22-1234
Documents\SSN:3:111 11 1234
When interacting with MS Office files, the best way is to use COM interfaces to grab the information you need.
If you are new to Powershell, COM will definitely be somewhat of a learning curve for you, as very little "beginner" documentation exists on the internet.
Therefore I strongly advise starting off small :
foreach ($file in (ls *.docx)) { # work on $file }
Here's some reading (admittedly, all this is for Excel as I build automated Excel charting tools, but the lessons will be very helpful for automating any Office application)
Powershell and Excel - Introduction
A useful document from a deleted link (link points to the Google Cache for that doc) - http://dev.donet.com/automating-excel-spreadsheets-with-powershell
Introduction to working with "Objects" in PS - CodeProject