Download files in R with URLs that do not end with the file extensions

1.1k views Asked by At

Does anyone have a trick up there sleeve for downloading GTFS using R when the URL doesn't end with ".zip"? For instance, this works:

download.file(url = "http://www.transperth.wa.gov.au/TimetablePDFs/GoogleTransit/Production/google_transit.zip", destfile = "temp.zip")

But the following create files of the right size that will not open:

download.file(url = "http://transitfeeds.com/p/ptv/497/latest/download", destfile = "temp.zip")

download.file(url = "http://transitfeeds.com/p/ptv/497/latest/download", destfile = "temp")

I suspect there is something fundamental I need to understand about urls but I don't know where to beging looking so any pointers would be appreciated.

Cheers,

Anthony

2

There are 2 answers

2
kukuk1de On BEST ANSWER

Your link is probably a redirect. Try using the httr package as described here R download file redirect error

library(httr)

url <- "http://transitfeeds.com/p/ptv/497/latest/download"    
GET(
        url = url,
        write_disk("gtfs.zip"),
        verbose()
    ) -> res

I was able to download the file and open it. If it works you can remove the verbose() part.

1
dhersz On

@kukul1de answer does the trick.

I'd also note that transitfeeds links to the official download URL. The link is located on the right, under "About This GTFS Feed" (check the image below):

enter image description here

Then you can right-click and select "Copy Link Location", which will give you the official URL with a .zip extension, which you can use in conjunction with download.file().

HOWEVER, this specific URL links to a file which is actually a .zip that contains many folders, each one containing a distinct GTFS file, and not a .zip in the GTFS format.

Was it an actual GTFS .zip file you would be able to use either {gtfstools} or {tidytransit} to read it, but unfortunately the file format does not allow it. Check it out:

tmp <- tempfile(pattern = "gtfs", fileext = ".zip")

download.file(
    "http://data.ptv.vic.gov.au/downloads/gtfs.zip", 
    destfile = tmp
)

zip::zip_list(tmp)
#>                 filename compressed_size uncompressed_size           timestamp
#> 1                     1/               0                 0 2021-02-22 19:23:20
#> 2                    10/               0                 0 2021-02-22 19:23:20
#> 3  10/google_transit.zip            3231              4011 2021-02-22 19:09:56
#> 4                    11/               0                 0 2021-02-22 19:23:20
#> 5  11/google_transit.zip           29966             32109 2021-02-22 19:10:12
#> 6   1/google_transit.zip         7262254           7625276 2021-02-22 19:01:56
#> 7                     2/               0                 0 2021-02-22 19:23:20
#> 8   2/google_transit.zip         5667379           6269932 2021-02-22 19:03:34
#> 9                     3/               0                 0 2021-02-22 19:23:20
#> 10  3/google_transit.zip         6714271           7782585 2021-02-22 19:05:04
#> 11                    4/               0                 0 2021-02-22 19:23:20
#> 12  4/google_transit.zip        66336783          67508547 2021-02-22 19:23:16
#> 13                    5/               0                 0 2021-02-22 19:23:20
#> 14  5/google_transit.zip        27834469          27962731 2021-02-22 19:06:16
#> 15                    6/               0                 0 2021-02-22 19:23:20
#> 16  6/google_transit.zip        13730731          14172729 2021-02-22 19:09:10
#> 17                    7/               0                 0 2021-02-22 19:23:20
#> 18  7/google_transit.zip           46932             50417 2021-02-22 19:09:24
#> 19                    8/               0                 0 2021-02-22 19:23:20
#> 20  8/google_transit.zip          574316            580906 2021-02-22 19:09:42

Let's say you want to read the GTFS file inside the 1/ folder. Then you can unzip this file with zip::unzip():

tmpd <- file.path(tempdir(), "tmp_gtfs")
dir.create(tmpd)

zip::unzip(tmp, files = "1/google_transit.zip", exdir = tmpd)

list.files(tmpd)
#> [1] "1"
list.files(file.path(tmpd, "1"))
#> [1] "google_transit.zip"

And read it with {gtfstools} or {tidytransit}. It depends on what you wanna do with the file:

gtfs_path <- file.path(tmpd, "1", "google_transit.zip")

gt_gtfs <- gtfstools::read_gtfs(gtfs_path)
names(gt_gtfs)
#> [1] "agency"         "routes"         "trips"          "stops"         
#> [5] "calendar"       "calendar_dates" "shapes"         "stop_times"

tt_gtfs <- tidytransit::read_gtfs(gtfs_path)
names(tt_gtfs)
#> [1] "agency"         "routes"         "trips"          "stops"         
#> [5] "calendar"       "calendar_dates" "shapes"         "stop_times"