I am trying to extract the texts of the annual reports of the companies. Its design is in the majority of two columns. So I don't know how to extract it correctly, since in R I with the pdftools package, I extract the first line of the first column next to the first line of the second column, instead of the second line of the first column.
This is my code:
library(pdftools)
readpdf<- pdf_text("https://www.telefonica.com/documents/153952/13347920/2019-Telefonica-Consolidated-Management-Report.pdf/0a9c8382-c9ff-ba52-1d5b-e431a7efab3f")
How can I do this correctly?
My answer would be using something like ABBY Fine reader or equivalent OCR software. I have tried on the same sort of data to use the open source software available in R, but it did not work well enough for my purposes