Get URL header information using httr or RCurl

1.3k views Asked by At

I have no success with getting url header information from R.

httr

Using HEAD from httr package:

ur <-"https://secure.energyaustralia.com.au/EnergyPriceFactSheets/Docs/EPFS/E_B_V_BEDGE_CI_37_13-10-2016.pdf"
HEAD(ur)

I get this error :

Error in curl::curl_fetch_memory(url, handle = handle) : 
  SSL connect error

I re-installed curl/httr packages but I still have the same error.

RCurl

Using RCurl I can get the header ( in the verbose mode) but I get another error:

getURI(ur,header=TRUE,verbose=TRUE)

I get :

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
  embedded nul in string:

curl

Using curl in a terminal it works fine:

 curl -I https://secure.energyaustralia.com.au/EnergyPriceFactSheets/Docs/EPFS/E_B_V_BEDGE_CI_37_13-10-2016.pdf

and I get :

HTTP/1.1 200 OK
Content-Length: 237503
Content-Type: application/pdf
Last-Modified: Wed, 14 Dec 2016 05:18:09 GMT
Accept-Ranges: bytes
ETag: "c27d5775c955d21:27a"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Fri, 30 Dec 2016 15:44:05 GMT

Means that curl is well installed in my machine bit something turs worng when I try to do the same thing from R.

Any help is welcome. Thank you.

edit

Looks like the problem depend of the configuration system. Mine is

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
1

There are 1 answers

0
hrbrmstr On BEST ANSWER

Wow. This was vexing. After both of us trying a bunch of things in a chat session, it turns out the key was not letting httrcurlopenssl auto-negotiate the SSL/TLS connection. There may be an underlying CA bundle issue on Ubuntu 16.04 causing this, but it's solvable without dealing with that via:

library(httr)
library(dplyr)

ur <- "https://secure.energyaustralia.com.au/EnergyPriceFactSheets/Docs/EPFS/E_B_V_BEDGE_CI_37_13-10-2016.pdf"

HEAD(ur, config(sslversion=4)) %>%  ## <- this is the magic line
  .$headers %>%
  as_data_frame() %>%
  glimpse()
## Observations: 1
## Variables: 8
## $ content-length <chr> "237503"
## $ content-type   <chr> "application/pdf"
## $ last-modified  <chr> "Wed, 14 Dec 2016 05:18:09 GMT"
## $ accept-ranges  <chr> "bytes"
## $ etag           <chr> "\"c27d5775c955d21:27a\""
## $ server         <chr> "Microsoft-IIS/6.0"
## $ x-powered-by   <chr> "ASP.NET"
## $ date           <chr> "Fri, 30 Dec 2016 19:20:47 GMT"