Function decode_short_URL from twitteR package not working

492 views Asked by At

I am using decode_short_url of the twitteR package to decode shortened URLs from Twitter posts, but I am not able to get the desired results, It is always giving back the same results such as:

decode_short_url(decode_short_url("http://bit.ly/23226se656"))

## http://bit.ly/23226se656
## [1] "http://bit.ly/23226se656
1

There are 1 answers

1
hrbrmstr On BEST ANSWER

UPDATE I wrapped this functionality in a package and managed to get it on CRAN same-day. Now, you can just do:

library(longurl)

expand_urls("http://bit.ly/23226se656", check=TRUE, warn=TRUE)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%

## Source: local data frame [1 x 2]
## 
##                   orig_url expanded_url
## 1 http://bit.ly/23226se656           NA
## 
## Warning message:
## In FUN(X[[i]], ...) : client error: (404) Not Found

You can pass in a vector of URLs and get a data_frame/data.frame back in that form.


That particular bit.ly URL gives a 404 error. Here's a version of decode_short_url that has an optional check parameter that will attempt a HEAD request and throw a warning message for any HTTP status other than 200.

You can further modify it to return NA in the event the "expanded" link 404's (I have no idea what you need this to really do in the event the link is bad).

NOTE that the addd HEAD request will significantly slow the process down, so you may want to do a first pass with check=FALSE to a separate column, then compare which weren't "expanded", then check those with check=TRUE.

You might also want to rename this to avoid namespace conflicts with the one from twitteR.

decode_short_url <- function(url, check=FALSE, ...) {

  require(httr)

  request_url <- paste("http://api.longurl.org/v2/expand?url=", 
                      url, "&format=json", sep="")
  response <- GET(request_url, query=list(useragent="twitteR"), ...)

  parsed <- content(response, as="parsed")

  ret <- NULL
  if (!("long-url" %in% names(parsed))) {
    ret <- url
  } else {
    ret <- parsed[["long-url"]]
  }

  if (check) warn_for_status(HEAD(url))

  return(url)

}

decode_short_url("http://bit.ly/23226se656", check=TRUE)

## [1] "http://bit.ly/23226se656"
## Warning message:
## In decode_short_url("http://bit.ly/23226se656", check = TRUE) :
##   client error: (404) Not Found