Convert a PDF to Images using sips

3k views Asked by At

I want to convert a pdf with several pages to single image files using sips. I know there are several other (probably better) solutions to do this but sips is installed on every mac and don't need a licence.

What I tried:

sips -s format png myPDF.pdf --out myIMG.png

That gives me an image of the first site from the pdf.

Now my Question: Is there a possibility to get images for each page of the pdf?

Thanks for your advise!

2

There are 2 answers

1
Mark Setchell On BEST ANSWER

I have no idea whether you are supposed to do this sort of thing this way, but the Automator on macOS has an action called Split PDF which you could use to split a PDF into separate pages and then use sips on each one...

To start Automator, press space and start typing Automator till it guesses correctly and hit . This is called a Spotlight Search apparently and is the quickest way to find anything on a Mac but no-one tells you that!

Then create a new Application, and select PDFs on the left (highlighted in red), then Split PDF (also in red) and drag that into the "work-area" on the right.

enter image description here

I saved that then as splitter.

Then I started Terminal - same Spotlight Search method as starting Automator above, but start typing Terminal instead.

Now go to where you saved splitter and you'll see splitter.app:

ls -ld splitter*
drwxr-xr-x@ 3 mark  staff  96 27 Nov 12:09 splitter.app

Now I want to split a 10-page document called "a.pdf", so I ran:

echo "a.pdf" | automator -i - ./splitter.app

Sample Output

2018-11-27 12:15:21.200 automator[24004:3655998] Cache location entry for /Applications/Photos.app in cache file at /Users/mark/Library/Caches/com.apple.automator.actionCache-bundleLocations.plist is not valid: (null)
(
  "/Users/mark/Desktop/a-page1.pdf",
  "/Users/mark/Desktop/a-page2.pdf",
  "/Users/mark/Desktop/a-page3.pdf",
  "/Users/mark/Desktop/a-page4.pdf",
  "/Users/mark/Desktop/a-page5.pdf",
  "/Users/mark/Desktop/a-page6.pdf",
  "/Users/mark/Desktop/a-page7.pdf",
  "/Users/mark/Desktop/a-page8.pdf",
  "/Users/mark/Desktop/a-page9.pdf",
  "/Users/mark/Desktop/a-page10.pdf"
)

And it spits out 10 separate 1-page PDF documents on my desktop named per the output.


I have no idea what the warning about "Photos App" cache file means, so if anyone knows, maybe they would tell me what it means and how to get rid of it.


Also, I presume that Automator is somehow calling the action from /System/Library/Automator/Split PDF.action:

file "/System/Library/Automator/Split PDF.action/Contents/MacOS/Split PDF" 


/System/Library/Automator/Split PDF.action/Contents/MacOS/Split PDF: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit bundle x86_64] [i386:Mach-O bundle i386]
/System/Library/Automator/Split PDF.action/Contents/MacOS/Split PDF (for architecture x86_64):  Mach-O 64-bit bundle x86_64
/System/Library/Automator/Split PDF.action/Contents/MacOS/Split PDF (for architecture i386):    Mach-O bundle i386

But, I have no idea how I can execute/call that from Terminal, without needing to start/write any Automator stuff. So, if anyone, @vadian maybe, knows, I would love to know that too! It appears to be a bundle, but if I run mdls on it, there is no bundle identifier listed, so I cannot run it with:

open -b <BUNDLE-IDENTIFIER>
5
garafajon On

This will do one page and let you set your resolution for the rasterization:

sips -s format png in.pdf -z 1024 1024 --out out.png

for all pdf files in directory and subdirectories:

find . -name "*.pdf" -exec sips -s format png {} -z 1024 1024 --out {}.png \;

the -exec part of this executes the rest as a command for each matching file until the \; terminator, while replacing {} with each file it finds. Super handy!

However, user137369 pointed out that the original question is regarding multi-page PDFs. So, since sips only processes the first page, we must first break the PDF into its pages. For this, I created a simple script using swift so we can access Apple's PDFKit.

So, if you have to multi-page PDFs, first save this code into a file called pdfburst and give it execute permissions with: chmod pdfburst 0755. It is possible you need Xcode installed for this to work... I don't know.

#!/usr/bin/swift
import Foundation
import PDFKit

func splitPDF(inputPath: String) {
    let docURL = URL(fileURLWithPath: inputPath)
    guard let pdfDocument = PDFDocument(url: docURL) else {
        print("Error: Unable to open PDF at \(inputPath)")
        return
    }
    guard pdfDocument.pageCount > 1 else {
        print(inputPath)
        return
    }

    let baseFileName = docURL.deletingPathExtension()
    for i in 0..<pdfDocument.pageCount {
        guard let page = pdfDocument.page(at: i) else { continue }
        let newDocument = PDFDocument()
        newDocument.insert(page, at: 0)
        let outputPath = baseFileName.path(percentEncoded: false) + "_page_\(i+1).pdf"
        newDocument.write(to: URL(fileURLWithPath: outputPath))
        print(outputPath)
    }
}

if CommandLine.arguments.count < 2 {
    print("Usage: \(CommandLine.arguments.first!) <inputPDF>")
    exit(1)
}

let inputPath = CommandLine.arguments[1]

splitPDF(inputPath: inputPath)

This will process an input .pdf file and split it into sub-pages if applicable. Its output is the names of the output files, or the original if no split required. This way we can pipe its output to our original file rasterization process from above.

Putting it all together, we get:

find . -name "*.pdf" -exec ./pdfburst {} \; | awk '{print "sips -s format png \"" $0 "\" -z 1024 1024 --out \"" $0 ".png\""}' | bash

Breaking this down, it:

  1. finds all pdf files recursively and runs pdfburst on each
  2. pdfburst splits (if needed) each into pages & echoes all files
  3. awk reads page files and makes a sips command for each
  4. bash executes each line which actually runs sips and makes each .png

This might seem complex but hey, this is what was asked for folks!