With below code I extract data from a pdf file using pdftools:
library(pdftools)
library(readr)
download.file("https://www.stoxx.com/document/Reports/SelectionList/2020/August/sl_sxebmp_202008.pdf","sl_sxebmp_202008.pdf", mode = "wb")
txt <- pdf_text("sl_sxebmp_202008.pdf")
txt <- read_lines(txt)
print(txt)
How could I show these data as data.frame?
I would suggest a
tabulizer
approach using your file. You can useextract_tables()
to get all data into a list and then process it. First element in the list will contain variable names so it is better to process this element first. The code to do that is next:Some rows of the output
d3
(1753 rows and 11 columns):