Possible encoding issues with PDFDocument

373 views Asked by At

I am using PDFKit in a Mac app (Xcode 11.7, 10.15 deployment target) to view pdfs. Users are able to highlight selections and either copy the text, or create quotes.

With some pdfs, I cannot get the correct string contents for the highlight.

Take the following pdf: https://www.irs.gov/pub/irs-pdf/iw8bene.pdf. If it is opened in Preview, it is possible to copy and paste contents into TextEdit, for example.

If I open this pdf with PDFView, only some text can be copied and pasted (the main heading for example), but body text only pastes the copied spaces! I have no custom code to handle copy on my PDFView.

If I evaluate the current PDFSelection when the document is highlighted, I get spaces, and nonsense characters in the string:

for character in pdfSelection.string!.unicodeScalars {
    print(character.value)
}

Example result:

32
1113109
1113135
1113135
1113109
32
1113118
1113091
32

Whatever is wrong, the standard copy code is falling foul of it too, so perhaps there is some set-up issue on PDFView or PDFDocument that I am missing? I simply create a PDFView in Interface Builder, then open a PDFDocument with a URL and set it on the view.

1

There are 1 answers

0
Giles On BEST ANSWER

This issue was being caused elsewhere in my application, but still related to PDFDocument. When dragging a pdf to my app I create a PDFDocument to check validity, then save that item to the app's folder:

guard let pdf = PDFDocument(url: fileURL) else { ... }
guard pdf.write(to: documentURL(forID: documentID, andType: .pdf)) else { ... }

It was this processing of the file that caused it to be subtly modified/broken.

This was naive of me based on the complexity of pdfs. I will simply copy the original file in future.