I want to use rvest
to scrape a page which has titles and run times of talks at a recent conference and then combine the values into a tibble
library(tibble)
library(rvest)
url <- "https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference?sort=status&direction=desc&page=14"
title <- page %>%
html_nodes("h3 a") %>%
html_text()
length <- page %>%
html_nodes(".tile .caption") %>%
html_text()
df <- tibble(title,length)
If you look at the page, you will see that for one of the talks there is no value - and in View source there is no class="caption"
for this talk
Is there any way I can substitute an NA
to show missing values?
The simplest way is to select a node that encloses both of the nodes you want for each row, then iterate over them, pulling out both of the nodes you want at once.
purrr::map_df
is handy for not only iterating, but even combining the results into a nice tibble: