Using R to convert GTFS spatial data from character to numeric

108 views Asked by At

I am following a vignette for gtfstools (https://cran.r-project.org/web/packages/gtfstools/vignettes/gtfstools.html) but am getting stuck with the data format. Basically, I am linking to a gtfs dataset, which is a zip folder with .txt files inside it.

ART2019Path <- file.path(GTFS_path, "2019-10 Arlington.zip")
ART2019GTFS <- read_gtfs(ART2019Path) 

Here is the data: https://realtime.commuterpage.com/rtt/public/utility/gtfs.aspx

The data loads fine but it is automatically read as all characters. I need most of the data to be numeric for my data analysis purposes. For example, showing transit geometry:

trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
plot(trip_geom$geometry)

I tried mutating all data, assuming data without numbers would stay as characters, but it didn't work:

ART2019GTFS <- mutate_all(ART2019GTFS, funs(as.numeric))

I am relatively new to R so not sure how to tackle this.

Any help figuring this out would be appreciated.

1

There are 1 answers

1
IRTFM On BEST ANSWER

When I follow that link I get a zip file named google_transit.zip which has several comma separated text files in it. When I runthis:

ART2019GTFS <- read_gtfs("~/google_transit.zip") 

I get this (one dataframe for each text file):

> str(ART2019GTFS)
List of 8
 $ agency        :Classes ‘data.table’ and 'data.frame':    1 obs. of  6 variables:
  ..$ agency_id      : chr "1"
  ..$ agency_name    : chr "Arlington Transit"
  ..$ agency_url     : chr "http://www.arlingtontransit.com"
  ..$ agency_phone   : chr "703-228-7433"
  ..$ agency_timezone: chr "America/New_York"
  ..$ agency_lang    : chr "en"
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ calendar      :Classes ‘data.table’ and 'data.frame':    5 obs. of  10 variables:
  ..$ service_id: chr [1:5] "1" "2" "3" "4" ...
  ..$ monday    : int [1:5] 1 0 1 0 0
  ..$ tuesday   : int [1:5] 1 0 1 0 0
  ..$ wednesday : int [1:5] 1 0 1 0 0
  ..$ thursday  : int [1:5] 1 0 1 0 0
  ..$ friday    : int [1:5] 0 1 1 0 0
  ..$ saturday  : int [1:5] 0 0 0 1 0
  ..$ sunday    : int [1:5] 0 0 0 0 1
  ..$ start_date: Date[1:5], format: "2022-03-27" "2022-03-27" "2022-03-27" ...
  ..$ end_date  : Date[1:5], format: "2023-12-31" "2023-12-31" "2023-12-31" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ calendar_dates:Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
  ..$ service_id    : chr [1:3] "1" "3" "5"
  ..$ date          : Date[1:3], format: "2022-05-30" "2022-05-30" "2022-05-30"
  ..$ exception_type: int [1:3] 2 2 1
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ routes        :Classes ‘data.table’ and 'data.frame':    21 obs. of  8 variables:
  ..$ route_id        : chr [1:21] "41" "42" "43" "45" ...
  ..$ agency_id       : chr [1:21] "1" "1" "1" "1" ...
  ..$ route_short_name: chr [1:21] "41" "42" "43" "45" ...
  ..$ route_long_name : chr [1:21] "Columbia Pike-Ballston-Court House" "Ballston-Pentagon" "Crystal City-Courthouse" "Columbia Pike-DHS/Sequoia-Rosslyn" ...
  ..$ route_type      : int [1:21] 3 3 3 3 3 3 3 3 3 3 ...
  ..$ route_color     : chr [1:21] "DCC154" "D7171F" "BC1B8D" "0084CA" ...
  ..$ route_text_color: chr [1:21] "FFFFFF" "FFFFFF" "FFFFFF" "FFFFFF" ...
  ..$ route_url       : chr [1:21] "https://www.arlingtontransit.com/routes-schedules/art-41/" "https://www.arlingtontransit.com/routes-schedules/art-42/" "https://www.arlingtontransit.com/routes-schedules/art-43/" "https://www.arlingtontransit.com/routes-schedules/art-45/" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ shapes        :Classes ‘data.table’ and 'data.frame':    10721 obs. of  4 variables:
  ..$ shape_id         : chr [1:10721] "9" "9" "9" "9" ...
  ..$ shape_pt_lon     : num [1:10721] -77.1 -77.1 -77.1 -77.1 -77.1 ...
  ..$ shape_pt_lat     : num [1:10721] 38.9 38.9 38.9 38.9 38.9 ...
  ..$ shape_pt_sequence: int [1:10721] 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ stop_times    :Classes ‘data.table’ and 'data.frame':    57711 obs. of  7 variables:
  ..$ trip_id       : chr [1:57711] "1" "1" "1" "1" ...
  ..$ arrival_time  : chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
  ..$ departure_time: chr [1:57711] "10:25:00" "10:27:25" "10:28:53" "10:30:00" ...
  ..$ stop_id       : chr [1:57711] "138" "141" "867" "144" ...
  ..$ stop_sequence : int [1:57711] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ stop_headsign : chr [1:57711] "" "" "" "" ...
  ..$ timepoint     : int [1:57711] 1 0 0 1 0 0 0 0 0 0 ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ stops         :Classes ‘data.table’ and 'data.frame':    640 obs. of  6 variables:
  ..$ stop_id  : chr [1:640] "83" "85" "87" "89" ...
  ..$ stop_code: chr [1:640] "51001" "51003" "51005" "51007" ...
  ..$ stop_name: chr [1:640] "Ballston Metro G, Fairfax Dr, EB @ N Stafford, NS" "Fairfax Drive, WB @ N Utah Street, FS" "16th Street N, WB @ N Glebe Road, FS" "16th Street N, WB @ N Buchanan Street, NS" ...
  ..$ stop_lat : num [1:640] 38.9 38.9 38.9 38.9 38.9 ...
  ..$ stop_lon : num [1:640] -77.1 -77.1 -77.1 -77.1 -77.1 ...
  ..$ stop_url : chr [1:640] "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51001#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51003#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51005#realTimeResultsContainer" "https://www.arlingtontransit.com/riding-art/rider-tools/art-realtime/?Stop=A51007#realTimeResultsContainer" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ trips         :Classes ‘data.table’ and 'data.frame':    2296 obs. of  7 variables:
  ..$ route_id     : chr [1:2296] "52" "52" "52" "52" ...
  ..$ service_id   : chr [1:2296] "3" "3" "3" "3" ...
  ..$ trip_id      : chr [1:2296] "1" "2" "3" "4" ...
  ..$ trip_headsign: chr [1:2296] "Ballston Metro" "Ballston Metro" "Ballston Metro" "Ballston Metro" ...
  ..$ direction_id : int [1:2296] 0 0 0 0 0 1 1 1 1 1 ...
  ..$ block_id     : chr [1:2296] "5202" "5202" "5202" "5202" ...
  ..$ shape_id     : chr [1:2296] "76" "76" "76" "76" ...
  ..- attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "class")= chr [1:3] "dt_gtfs" "gtfs" "list"

And then this apparently succeeds:

> trip_geom <- get_trip_geometry(ART2019GTFS, file = "shapes")
> str(trip_geom)
Classes ‘sf’, ‘data.table’ and 'data.frame':    2296 obs. of  3 variables:
 $ trip_id    : chr  "1" "2" "3" "4" ...
 $ origin_file: chr  "shapes" "shapes" "shapes" "shapes" ...
 $ geometry   :sfc_LINESTRING of length 2296; first list element:  'XY' num [1:131, 1:2] -77.2 -77.2 -77.2 -77.2 -77.2 ...
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
  ..- attr(*, "names")= chr [1:2] "trip_id" "origin_file"