Extract text well from a PDF with two columns in R

Question

Extract text well from a PDF with two columns in R

434 views Asked by David Perea At 18 September 2020 at 12:03

I am trying to extract the texts of the annual reports of the companies. Its design is in the majority of two columns. So I don't know how to extract it correctly, since in R I with the pdftools package, I extract the first line of the first column next to the first line of the second column, instead of the second line of the first column.

This is my code:

library(pdftools)
readpdf<- pdf_text("https://www.telefonica.com/documents/153952/13347920/2019-Telefonica-Consolidated-Management-Report.pdf/0a9c8382-c9ff-ba52-1d5b-e431a7efab3f")

How can I do this correctly?

Original Q&A

There are 1 answers

**Max Volpi** · Answer 1 · 2021-08-24T15:52:17+00:00

Max Volpi On 24 August 2021 at 15:52

My answer would be using something like ABBY Fine reader or equivalent OCR software. I have tried on the same sort of data to use the open source software available in R, but it did not work well enough for my purposes

TechQA.

Extract text well from a PDF with two columns in R

There are 1 answers

Related Questions in R

Related Questions in PDF

Related Questions in TEXT-MINING

Related Questions in PDFTOOLS

Popular Questions

Popular Tags

Trending Questions