Only convert files with the string "DUPLICATE" in the name

108 views Asked by At

I'm trying to make a script that converts PDF's to Tif.

  1. It copies the right files from one folder to another (thanks to the communities previous help).
  2. Next it converts all of the pdfs to tiff.
  3. Lastly it converts the tiff to tif (name change)

What I want to do now is to only convert pdf's with "DUPLICATE" in its file name to tiff. And finally remove the "DUPLICATE" from the new tiff's filename.

Does anyone know how to do that?

gci X:\IT\PDFtoTIFF\1 -filter {VKF*} | Move-Item -destination X:\IT\PDFtoTIFF\2

$tool = 'C:\Program Files (x86)\GPLGS\gswin32c.exe'
$pdfs = get-childitem . -recurse | where {$_.Extension -match "pdf"}

foreach($pdf in $pdfs)
{

    $tiff = $pdf.FullName.split('.')[0] + '.tiff'
    if(test-path $tiff)
    {
        "tiff file already exists " + $tiff
    }
    else        
    {   
        'Processing ' + $pdf.Name        
        $param = "-sOutputFile=$tiff"
        & $tool -q -dNOPAUSE -sDEVICE=tiffg4 $param -r300 $pdf.FullName -c quit
    }
}

Dir *.tiff | rename-item -newname {  $_.name  -replace ".tiff",".tif"  }

More details: The script needs to work like this:

  1. All file in the folder \itgsrv028\invoices$\INST that start with vkf need to be moved to this folder: \itgsrv028\invoices$\INST\V3

(This is currently working in the script)

  1. Only convert the files with “Duplicaat” in it’s name to Tiff
  2. Rename VKF_320150309DUPLICAAT.Tiff to 320150309.tif

Example: These files in the folder:

VKF_320150309.PDF

VKF_320150309DUPLICAAT.PDF

Need to become:

VKF_320150309.PDF

VKF_320150309DUPLICAAT.PDF

320150309.TIF (Converted from: VKF_320150309DUPLICAAT.PDF)

1

There are 1 answers

3
Vesper On

About using only "DUPLICAAT": You have to change your filtering a bit, to include a match for "DUPLICAAT" in there, like this:

$pdfs = get-childitem . -recurse | where {$_.Extension -match "pdf" -and $_.basename -match "DUPLICAAT"} 

About building a new name for the TIFF: You can use group placeholders in a regular expression to retrieve your valuable part from the middle of known characters. With your VKF_320150309DUPLICAAT.PDF as an example, you can convert it to a proper TIFF file name with this construction:

$tiff="$($pdf.directory)\$($pdf.basename -replace "VKF_([\w\s]+)DUPLICAAT",'$1').tiff"

This combines a -replace operator over a string, a replacement of $(expression) with its evaluated value in a string and combining proper extension string with path separator within a formatted string. This resolves as follows:

  1. This is a string, as indicated by double quotes wrapping.
  2. $(expression) at first occurrence evaluates to the value of $pdf.directory which contains path to parent without a trailing backslash. With $pdf equal to X:\IT\PDFtoTIFF\2\VKF_320150309DUPLICAAT.PDF this will return X:\IT\PDFtoTIFF\2.
  3. The $(expression) at the second occurrence evaluates to $pdf.basename -replace "VKF_(\w+)DUPLICAAT",'$1'. With the same PDF this equals to "VKF_320150309DUPLICAAT"-replace "VKF_(\w+)DUPLICAAT",'$1'. The round braces regexp portion in the expression matches "320150309" and this value is assigned to $1 which is then placed instead of the whole matched region. Thus your name gets stripped of both "VKF_" and "DUPLICAAT" letters in one go.
  4. The two returned strings get formed into one with a backslash in between and trailing .tiff, resulting in a X:\IT\PDFtoTIFF\2\320150309.tiff.

Hope this would help you in building better scripts that play with strings in Powershell.