I am using PDFKit in a Mac app (Xcode 11.7, 10.15 deployment target) to view pdfs. Users are able to highlight selections and either copy the text, or create quotes.
With some pdfs, I cannot get the correct string contents for the highlight.
Take the following pdf: https://www.irs.gov/pub/irs-pdf/iw8bene.pdf. If it is opened in Preview, it is possible to copy and paste contents into TextEdit, for example.
If I open this pdf with PDFView, only some text can be copied and pasted (the main heading for example), but body text only pastes the copied spaces! I have no custom code to handle copy on my PDFView.
If I evaluate the current PDFSelection when the document is highlighted, I get spaces, and nonsense characters in the string:
for character in pdfSelection.string!.unicodeScalars {
print(character.value)
}
Example result:
32
1113109
1113135
1113135
1113109
32
1113118
1113091
32
Whatever is wrong, the standard copy code is falling foul of it too, so perhaps there is some set-up issue on PDFView or PDFDocument that I am missing? I simply create a PDFView in Interface Builder, then open a PDFDocument with a URL and set it on the view.
This issue was being caused elsewhere in my application, but still related to PDFDocument. When dragging a pdf to my app I create a PDFDocument to check validity, then save that item to the app's folder:
It was this processing of the file that caused it to be subtly modified/broken.
This was naive of me based on the complexity of pdfs. I will simply copy the original file in future.