Background
I was importing and cleaning a dataset (head included below) using read_csv() (from readr), and noticed various columns which probably should be integer columns were left as numeric columns by readr.
Question
What is an elegant way of converting the numeric columns to integer columns if it is possible (i.e. if there would be no rounding/loss in accuracy), and if not, leaving them as numeric?
For the example dataset, it would mean converting the columns "Execution", "TDCJ Number", "Age at Execution", and "Weight".
It doesn't need to be a tidyverse solution, base R or any package(s) are fine too.
Research done:
I tried searching Google, and Stack Overflow, without any luck. Honestly, I'm surprised this question hasn't been asked before! read.csv() converts to integer, but doesn't convert dates automatically, like read_csv().
Data:
# the entire dataset can be downloaded at https://archive.org/download/tx_deathrow_full/tx_deathrow_full.csv if you wish. The head is below
library(tidyverse)
tx_deathrow <- structure(list(Execution = c(553, 552, 551, 550, 549, 548), `Date of Birth` = structure(c(5014,
-6701, 4110, 6302, 3737, -5266), class = "Date"), `Date of Offence` = structure(c(12743,
3433, 12389, 13975, 13039, 11444), class = "Date"), `Highest Education Level` = c(9,
12, 10, 11, 12, 12), `Last Name` = c("Young", "Bible", "Castillo",
"Davila", "Rodriguez III", "Battaglia"), `First Name` = c("Christopher Anthony",
"Danny Paul", "Juan Edward", "Erick Daniel", "Rosendo", "John David"
), `TDCJ
Number` = c(999508, 999455, 999502, 999545, 999534,
999412), `Age at Execution` = c(34, 66, 37, 31, 38, 62), `Date Received` = structure(c(13238,
12250, 13053, 14302, 14013, 11808), class = "Date"), `Execution Date` = structure(c(17729,
17709, 17667, 17646, 17617, 17563), class = "Date"), Race = c("Black",
"White", "Hispanic", "Black", "Hispanic", "White"), County = c("Bexar",
"Harris", "Bexar", "Tarrant", "Lubbock", "Dallas"), `Eye Color` = c("Brown",
"Blue", "Brown", "Brown", "Brown", "Green"), Weight = c(216,
194, 180, 161, 198, 188), Height = c("6' 1\"", "5' 7\"", "5' 11\"",
"5' 11\"", "5' 8\"", "6' 0\""), `Native County` = c("Bexar",
"Brazoria", "Bexar", "Tarrant", "Wichita", "Dallas"), `Native State` = c("Texas",
"Texas", "Texas", "Texas", "Texas", "Texas"), `Last Statement` = c("l want to make sure the Patel family knows I love them like they love me. Make sure the kids in the world know I'm being executed and those kids I've been mentoring keep this fight going. I'm good Warden.",
NA, "To everyone that has been there for me you know who you are. Love y'all. See y'all on the other side.That's it.",
"Yes, I would like to say nephew it burns huh. You know I might have lost the fight but I'm still a soldier. I still love you all. To my supporters and family y'all hold it down. Ten Toes down right. That's all.",
"First I would like to say I have been here since September 2005. I had the honor and privilege to know many prison guards and staff. I want to thank all of them. I would like for everyone to write the people on death row as they are all good men and I am very happy I got to know them. All of their lives are worth knowing about.\n\nSecondly on February 14th the medical examiner and the chief nurse were engaged in numerous false illegal acts. They tried to cover up that thousands were wrongfully convicted by Matt Powell, district attorney. This needs to be brought to justice.\n\nI call upon the FBI to investigate Matt Powell and the Lubbock County Medical Examiner. Lastly, I was born and raised Catholic and it was not lost upon me that this is Holy Week and last Sunday was Palm Sunday. Yesterday was my birthday. Today is the day I join my God and father. The state may have my body but not my soul.\n\nIn order to save my brothers on death row I call upon Pope Francis and all the people of the world.\n\nLastly, I want everyone to boycott every single business in the state of Texas until all the businesses are pressed to stop the death penalty.\n\nWith that Lord I commend my spirit.\n\nWarden I am ready to join my father.",
"No, Well, Hi Mary Jean. See y'all later. Go ahead please.")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Although some {readr} functions like
parse_guess()ortype_convert()have already been suggested in the comments, they only work on columns of typecharactercorrectly, which is why they can't correct columns that should benumericinstead ofdouble.One trick is to read in all columns as character with the argument
col_types = cols(.default = "c"). Then we can pipe the result intotype_convert()withguess_integersetTRUE.Don't know why {readr} uses a different process to guess the columns internally when reading the csv - doesn't make much sense to me.
Another way of reading in the data is
data.table::fread()which really does a nice job in converting correctly to numeric if appropriate. However, in this case dates are converted toIDatean integer based date format which is non standard. Further, the usual standard output is adata.tablebut we can get back adata.frameby settingdata.table = FALSE. I unfortunately don't see a way to prevent dates from being converted toIDate.Created on 2024-03-16 with reprex v2.0.2