I am trying to import tables from a website to R. The data is shown in the html as well as a downloadable PDF.
I have tried using the tabulizer
package on the PDF, specifically the expand_tables()
and extract_areas()
functions, and they both failed for different reasons. extract_tables()
only extracted the first table of each page, and extract_areas()
would put each page as a single list item or dataframe in a very bad formatting. A list of the tables is a good idea, but then each table would have to be a separate list item...
I've also attempted to adapt the sollution proposed in this StackOverflow question with rvest
but for some reason, the website imports as blank, so I don't know how to import the table.
The goal is to have the tables stored separately (not to combine them) – the list idea sounds very efficient for that, but I could be wrong – in code generalizable to other websites or PDFs (they hold the same data so it doesn't make a difference).
I was able to extract the tables of the PDF file with the following code :