I want to extract the table of page 112 in this pdf document:
# report 2017
url_location <-"http://publications.credit-suisse.com/tasks/render/file/index.cfm?fileid=432759CA-0A73-57F6-04C67EF7EE506040"
out <- extract_tables(url_location, pages = 112)
I have tried using these tutorials (link1,link2) about 'tabulize' package but I largely failed. There are some difficult aspects which I am not very experienced how to handle in R.
Can someone suggest something and help me with that ?
Installation
devtools::install_github("ropensci/tabulizer")
# load package
library(tabulizer)
Java deps — while getting easier to deal with — aren't necessary when the tables are this clean. Just a bit of string wrangling will get you what you need:
UPDATE
This is more generic but you'll still have to do some manual cleanup. I think you would even if you used Tabula.
Take a look at the outputs for ^^. They aren't in terrible shape and some of the cleanup can be programmatic.