I am trying to extract certain tables from multiple pdf files but not all the files have that table. How can I use trycatch or similar to skip and proceed to the next file even if the first file does not contain the certain table?
library(pdftools)
library(tidyverse)
url <- c("https://www.computershare.com/News/Annual%20Report%202019.pdf?2",
"https://www.annualreports.com/HostedData/AnnualReportArchive/a/LSE_ASOS_2018.PDF")
raw_text <- map(url, pdf_text)
clean_table1 <- function(raw) {
raw <- map(raw, ~ str_split(.x, "\\n") %>% unlist())
raw <- reduce(raw, c)
table_start <- stringr::str_which(tolower(raw), "twenty largest shareholders")
table_end <- stringr::str_which(tolower(raw), "total")
table_end <- table_end[min(which(table_end > table_start))]
table <- raw[(table_start + 3 ):(table_start + 25)]
table <- str_replace_all(table, "\\s{2,}", "|")
text_con <- textConnection(table)
data_table <- read.csv(text_con, sep = "|")
#colnames(data_table) <- c("Name", "Number of Shares", "Percentage")
data_table
}
shares <- map_df(raw_text, clean_table1)
I got the following error when I tried running.
Error in (table_start + 3):(table_start + 25) : argument of length 0
In addition: Warning message:
In min(which(table_end > table_start)) :
no non-missing arguments to min; returning Inf
You can check for
length
oftable_start
andreturn
NULL
if it is 0 so while usingmap_df
those records would automatically collapse and you would have one combined dataframe.