R - Web Scrape of job board

863 views Asked by At

I am trying to get a list of Companies and jobs in a table from indeed.com's job board.

I am using the rvest package using a URL Base of http://www.indeed.com/jobs?q=proprietary+trader&

install.packages("gtools")
install.packages('rvest")
library(rvest)
library(gtools)



mydata = read.csv("setup.csv", header=TRUE)

url_base <- "http://www.indeed.com/jobs?q=proprietary+trader&"
names <- mydata$Page


results<-data.frame()
for (name in names){
url <-paste0(url_base,name)
title.results <- url %>%
   html() %>%
   html_nodes(".jobtitle") %>%
   html_text()

company.results <- url %>%
   html() %>%
   html_nodes(".company") %>%
   html_text()


results <- smartbind(company.results, title.results)
results3<-data.frame(company=company.results, title=title.results)

}

new <- results(Company=company, Title=title) 

and then looping a contatenation. For some reason it is not grabbing all of the jobs and mixing the companies and jobs.

1

There are 1 answers

0
Nick Kennedy On

It might be because you make two separate requests to the page. You should change the middle part of your code to:

page <- url %>%
   html()

title.results <- page %>%
   html_nodes(".jobtitle") %>%
   html_text()

company.results <- page %>%
   html_nodes(".company") %>%
   html_text()

When I do that, it seems to give me 10 jobs and companies which match. Can you give an example otherwise of a query URL that doesn't work?