Reading text from PDF with CGPDFScanner - what is wrong with this PDF file?

527 views Asked by At

I'm trying to extract the text from this file:

https://www.dropbox.com/s/249snnj1nsve5ir/Lebenslauf.pdf?dl=0

using CGPDFScanner. I can detect that the character encoding is WinAnsiEncoding from the included PDF dictionary, but the characters all come out garbled. As cross check, I tried copy pasting text from Preview app in Mac OS X, which works - so somehow it must be possible to extract it as Strings. On the other hand, the commercial 3rd party framework http://www.fastpdfkit.com can't correctly extract the text, too.

Anyone has an idea what I'm missing?

As a side note, I was using https://github.com/KurtCode/PDFKitten to scan the PDF.

0

There are 0 answers