I have the following website: https://www.coes.org.pe/Portal/PostOperacion/Reportes/Ieod and I wanted to download all the files from 2021 to 2023. Once you enter to the website you can choose between different folders but now I only want to focus en the 2023 one and download all the files in that folder.
I've try using loops and the rvest package with no avail. I want to be able to download all files in the 2023 folder but I can't find my way around the code. Please help.
Extra Info:
So the code I use is a very basic one since I'm just starting to work with more complex task in R, here is what I tried.
IOD <- read_html("https://www.coes.org.pe/Portal/PostOperacion/Reportes/Ieod?path=Post%20Operaci%C3%B3n%2FReportes%2FIEOD%2F2023%2F")
urls <- IOD %>%
html_nodes('context-menu-list-context-menu-root') %>% # get all `area` nodes
html_attr('href') %>% # get the link attribute of each node
sub('.htm$', '.zip', .) %>% # change file suffix
paste0('https://www.coes.org.pe/Portal/PostOperacion/Reportes/Ieod', .) # append to base URL
# create a directory for it all
dir <- file.path(tempdir(), 'COES')
dir.create(dir)
lapply(urls, function(url) download.file(url, file.path(dir, basename(url))))
# check it's there
list.files(dir)
Once I run that code the outputs were:
IOD <- read_html("https://www.coes.org.pe/Portal/PostOperacion/Reportes/Ieod?path=Post%20Operaci%C3%B3n%2FReportes%2FIEOD%2F2023%2F")
urls <- IOD %>%
html_nodes('context-menu-list-context-menu-root') %>%
# get all `area` nodes
html_attr('href') %>%
# get the link attribute of each node
sub('.htm$', '.zip', .) %>%
# change file suffix
paste0('https://www.coes.org.pe/Portal/PostOperacion/Reportes/Ieod', .) # append to base URL
# create a directory for it all
dir <- file.path(tempdir(), 'COES')
dir.create(dir)
# Warning message:
# In dir.create(dir) :
# 'C:\Users\RCV\AppData\Local\Temp\Rtmp0AUH8C\COES' already exists
lapply(urls, function(url) download.file(url, file.path(dir, basename(url))))
# probando la URL 'https://www.coes.org.pe/Portal/PostOperacion/Reportes/Ieod'
# Content type 'text/html; charset=utf-8' length 48185 bytes (47 KB)
# downloaded 47 KB
# [[1]]
# [1] 0
# check it's there
list.files(dir)
# [1] "Ieod" "Ieod#"
I'm actually lost on what to do to be honest. Sorry is this is kinda of a basic question.
My attempt with
rvest::read_html_live():First of all we have to get monthly links:
Now, for each month you have to get daily links (below just an example for one month, you should extend it using lapply or other iteration):
Now, we have a link to particular day (31) in month (December). We have to extract the table from this page like:
And build the urls to single files:
Please note, it requires another iteration. In total 4 iterations: 1/ year, 2/ month, 3/ day, 4/ single files.