How to webscrape data from json in R using Rvest?

56 views Asked by At

I am attempting to webscrape the fixture list from this website

https://www.nrl.com/draw/?competition=111&round=1&season=2024

The output should be

Sea eagles, Rabbitohs

Roosters, Broncos

Knights, Raiders etc

I have written up the following code


url <- "https://www.nrl.com/draw/?competition=111&round=1&season=2024"

page <- read_html(url)

contentnodes <- page %>% html_nodes ("div.u-spacing-mt-24.pre-quench") %>% 
  html_attr("q-data") %>% jsonlite::fromJSON()

but I am getting the following error:

lexical error: invalid char in json text NA

Reading online some suggest the data is HTML rather than JSON but I have webscraped a different page on the same website with similar code so not entirely sure what has gone wrong here?

1

There are 1 answers

0
HoelR On BEST ANSWER
library(tidyverse)
library(httr2)

"https://www.nrl.com/draw//data?competition=111&season=2024" %>%
  request() %>% 
  req_perform() %>% 
  resp_body_json(simplifyVector = T) %>% 
  pluck("fixtures") %>% 
  unnest(c(homeTeam, awayTeam), names_sep = "_") %>% 
  select(contains("nickName"), 
         contains("odds"))

# A tibble: 8 × 4
  homeTeam_nickName awayTeam_nickName homeTeam_odds awayTeam_odds
  <chr>             <chr>             <chr>         <chr>        
1 Sea Eagles        Rabbitohs         2.17          1.69         
2 Roosters          Broncos           2.51          1.53         
3 Knights           Raiders           1.42          2.87         
4 Warriors          Sharks            1.60          2.34         
5 Storm             Panthers          2.24          1.65         
6 Eels              Bulldogs          1.47          2.70         
7 Titans            Dragons           1.49          2.64         
8 Dolphins          Cowboys           2.67          1.48