Scraping data using rvest

1.4k views Asked by At

I am trying to scrape the names of each of the search results from this page using the code below:

url2 <- "http://www.truckandtrailer.ca/search.cfm?intIndustryID=2&searchtype=advanced&pageaction=showresults&bitNew=0&intCategoryID=30&intMakeID=0&intSelectProvinceID=&x=26&y=6"

results <- url2 %>%
  html() %>%
  html_nodes(".desc_title") %>%
  html_text()
results

However it just returns:

character(0)

Any thoughts on how to fix this? Appreciate the help!

1

There are 1 answers

0
Metrics On

Here is the solution using RSelenium and rvest.

Note: Please see my answer here for working with RSelenium and rvest.

library(RSelenium)
library(rvest)
startServer() 
remDr <- remoteDriver(browserName = 'firefox')
remDr$open()

url2 <- "http://www.truckandtrailer.ca/search.cfm?intIndustryID=2&searchtype=advanced&pageaction=showresults&bitNew=0&intCategoryID=30&intMakeID=0&intSelectProvinceID=&x=26&y=6"
remDr$navigate(url2)
test.html <- html(remDr$getPageSource()[[1]])
  results<-test.html %>%
  html_nodes(".desc_title") %>%
  html_text(trim=TRUE)
  results

[1] "2009 FREIGHTLINER FLD 132 CLASSIC XL HIGHWAY TR..." "2014 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"        
 [3] "2014 KENWORTH W900-L HIGHWAY TRACTOR"               "2014 KENWORTH T660 HIGHWAY TRACTOR"                
 [5] "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"          "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"         
 [7] "(5) 2013 FREIGHTLINER CASCADIA - 113 HIGHWAY TR..." "(2) 2013 INTERNATIONAL PROSTAR HIGHWAY TRACTOR"    
 [9] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900B HIGHWAY TRACTOR"               
[11] "(2) 2013 KENWORTH T700 HIGHWAY TRACTOR"             "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[13] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900L HIGHWAY TRACTOR"               
[15] "2013 KENWORTH W900L HIGHWAY TRACTOR"                "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[17] "2013 PETERBILT 388 HIGHWAY TRACTOR"                 "(5) 2013 PETERBILT 388 HIGHWAY TRACTOR"            
[19] "2013 PETERBILT 389 HIGHWAY TRACTOR"                 "2013 PETERBILT 388 HIGHWAY TRACTOR"                
[21] "2013 VOLVO VNL670 HIGHWAY TRACTOR"                  "2013 VOLVO VNL630 HIGHWAY TRACTOR"                 
[23] "(5) 2012 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"     "2012 FREIGHTLINER CA125 HIGHWAY TRACTOR"           
[25] "(2) 2012 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"   
remDr$close()

Another approach is to use Phantomjs (no need to use cmd and no extra browser). The only thing that you need here is to download the exe file from here and place this in your R working directory (you can also specify the path if you don't want to place this in your working directory).

library(RSelenium)
library(rvest)
pJS <- phantom(extras = c('--ssl-protocol=tlsv1'))
remDr <- remoteDriver(browserName = "phantom")
remDr$open()
remDr$navigate("http://www.truckandtrailer.ca/search.cfm?intIndustryID=2&searchtype=advanced&pageaction=showresults&bitNew=0&intCategoryID=30&intMakeID=0&intSelectProvinceID=&x=26&y=6")
test.html <- html(remDr$getPageSource()[[1]])
results<-test.html %>%
       html_nodes(".desc_title") %>%
      html_text(trim=TRUE)
> results
[1] "2009 FREIGHTLINER FLD 132 CLASSIC XL HIGHWAY TR..." "2014 FREIGHTLINER CASCADIA HIGHWAY TRACTOR"        
[3] "2014 KENWORTH W900-L HIGHWAY TRACTOR"               "2014 KENWORTH T660 HIGHWAY TRACTOR"                
[5] "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"          "2013 FREIGHTLINER CASCADA HIGHWAY TRACTOR"         
[7] "(5) 2013 FREIGHTLINER CASCADIA - 113 HIGHWAY TR..." "(2) 2013 INTERNATIONAL PROSTAR HIGHWAY TRACTOR"    
[9] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900B HIGHWAY TRACTOR"               
[11] "(2) 2013 KENWORTH T700 HIGHWAY TRACTOR"             "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[13] "2013 KENWORTH T660 HIGHWAY TRACTOR"                 "2013 KENWORTH W900L HIGHWAY TRACTOR"               
[15] "2013 KENWORTH W900L HIGHWAY TRACTOR"                "2013 KENWORTH W900 HIGHWAY TRACTOR"                
[17] "2013 PETERBILT 388 HIGHWAY TRACTOR"                 "(5) 2013 PETERBILT 388 HIGHWAY TRACTOR"            
[19] "2013 PETERBILT 389 HIGHWAY TRACTOR"                 "2013 PETERBILT 388 HIGHWAY TRACTOR"                
[21] "2013 VOLVO VNL670 HIGHWAY TRACTOR"                  "2013 VOLVO VNL630 HIGHWAY TRACTOR"    
remDr$close
pJS$stop()

P.S. Please see the help file for details.