As the title states, I'm curious if it is possible for the html_text()
function from the rvest
package to store an NA
value if it is not able to find an attribute on a specific page.
I'm currently running a scrape over 199 pages (which works fine; tested on a few variables already).
Currently, when I search for a value that is only present on a some (136) of the 199 pages, html_text()
is only returning a vector of 136 strings. This is not useful because without NA
s I am unable to determine which pages contained the variable in question.
I see that html_atts()
is able to receive a default
input, but not html_text()
. Any tips?
Thank you so much!
If you create a new function to wrap error handling, it'll keep the
%>%
pipe cleaner and easier to grok for your future self and others:Also, by doing an
sapply
over the vector ofrecord_id
's you automagically get a vector back of whatever value that is you're trying to extract.