I'm trying to make a script that converts PDF's to Tif.
- It copies the right files from one folder to another (thanks to the communities previous help).
- Next it converts all of the pdfs to tiff.
- Lastly it converts the tiff to tif (name change)
What I want to do now is to only convert pdf's with "DUPLICATE" in its file name to tiff. And finally remove the "DUPLICATE" from the new tiff's filename.
Does anyone know how to do that?
gci X:\IT\PDFtoTIFF\1 -filter {VKF*} | Move-Item -destination X:\IT\PDFtoTIFF\2
$tool = 'C:\Program Files (x86)\GPLGS\gswin32c.exe'
$pdfs = get-childitem . -recurse | where {$_.Extension -match "pdf"}
foreach($pdf in $pdfs)
{
$tiff = $pdf.FullName.split('.')[0] + '.tiff'
if(test-path $tiff)
{
"tiff file already exists " + $tiff
}
else
{
'Processing ' + $pdf.Name
$param = "-sOutputFile=$tiff"
& $tool -q -dNOPAUSE -sDEVICE=tiffg4 $param -r300 $pdf.FullName -c quit
}
}
Dir *.tiff | rename-item -newname { $_.name -replace ".tiff",".tif" }
More details: The script needs to work like this:
- All file in the folder \itgsrv028\invoices$\INST that start with vkf need to be moved to this folder: \itgsrv028\invoices$\INST\V3
(This is currently working in the script)
- Only convert the files with “Duplicaat” in it’s name to Tiff
- Rename VKF_320150309DUPLICAAT.Tiff to 320150309.tif
Example: These files in the folder:
VKF_320150309.PDF
VKF_320150309DUPLICAAT.PDF
Need to become:
VKF_320150309.PDF
VKF_320150309DUPLICAAT.PDF
320150309.TIF (Converted from: VKF_320150309DUPLICAAT.PDF)
About using only "DUPLICAAT": You have to change your filtering a bit, to include a match for "DUPLICAAT" in there, like this:
About building a new name for the TIFF: You can use group placeholders in a regular expression to retrieve your valuable part from the middle of known characters. With your
VKF_320150309DUPLICAAT.PDF
as an example, you can convert it to a proper TIFF file name with this construction:This combines a
-replace
operator over a string, a replacement of$(expression)
with its evaluated value in a string and combining proper extension string with path separator within a formatted string. This resolves as follows:$pdf.directory
which contains path to parent without a trailing backslash. With$pdf
equal toX:\IT\PDFtoTIFF\2\VKF_320150309DUPLICAAT.PDF
this will returnX:\IT\PDFtoTIFF\2
.$pdf.basename -replace "VKF_(\w+)DUPLICAAT",'$1'
. With the same PDF this equals to"VKF_320150309DUPLICAAT"-replace "VKF_(\w+)DUPLICAAT",'$1'
. The round braces regexp portion in the expression matches "320150309" and this value is assigned to$1
which is then placed instead of the whole matched region. Thus your name gets stripped of both "VKF_" and "DUPLICAAT" letters in one go..tiff
, resulting in aX:\IT\PDFtoTIFF\2\320150309.tiff
.Hope this would help you in building better scripts that play with strings in Powershell.