I am a beginner.
I created a small code to web scraping with rvest.
I found a very convenient code %>% html_node ()%>% html_text ()%>% as.numeric ()
, but I was not able to correctly change the code for scraping url of image.
My code for web scraping url of image:
UrlPage <- html ("http://eyeonhousing.org/2012/11/gdp-growth-in-the-third-quarter-improved-but-still-slow/")
img <- UrlPage%>% html_node (". wp-image-5984")%>% html_attrs ()
Result:
class "Aligncenter size-full wp-image-5984" `enter code here`title "Blog gdp 2012_10_1" alt '" src "Http://eyeonhousing.files.wordpress.com/2012/11/blog-gdp-2012_10_1.jpg" height "337" width "450"
Question. How to get the only link without other attributes? (only )
Please help me find a solution. Thank you!
You need to specify which attribute you want to extract as a parameter for html_attr. Also, you may want to make your CSS selector, the parameter for html_node, more specific. Here is my code:
The link variable now contains the URL.
You can find a decent reference for css selectors here: http://www.w3schools.com/cssref/css_selectors.asp
Also the rvest documentation has some good examples on how to use its functions: http://cran.r-project.org/web/packages/rvest/rvest.pdf