I have a dataset in R where a time variable has been imported as text. This is because, not specifying it to be imported as text results in many observations being converted to NAs. However, I've discovered that the time variable has inconsistent formatting. Some rows have numeric values (for example, 0.24962962962962965), while others have the HH:MM:SS format (for example, 07:19:52). My goal is to convert this variable into a consistent HH:MM:SS time format in R.
How can I address this situation and convert the time variable to a consistent HH:MM:SS format for the entire dataset?
I've tried some approaches using mathematical operations and conversion functions, but I'm unsure how to handle both numeric values and time formats in a single column.
I've attached simplified data for this case:
datos_texto <- c("0.24962962962962965", "07:19:52", "0.123456789", "10:45:30", "0.567891234")
I would greatly appreciate any advice or code examples that could help me solve this issue and obtain a time variable in the desired format.
Thank you in advance for your assistance!
Two suggestions here: convert all decimal-days to
HH:MM:SS.SSS; or convert all timestamps to decimal days.Convert all to HH:MM:SS
We can use this function
num2timeto convert decimal values to times, assuming decimal is "decimal days" (so0.25is a quarter way through the day, or06:00:00).With this,
This can then be handled the same, whether retaining as a character string or converting into a "timestamp" (without date component) with something like
since in that format, numerical operations (plus, minus, difference, etc) are clearly defined.
Convert all to decimal-days
Another option is to convert the time-like fields to numeric.
With this,
and now
outisnumericas decimal days for all ofdatos_texto.Incidentally, one might be tempted to do
datos_texto[nocolon] <- as.numeric(datos_texto[nocolon]). Realize thatdatos_texto, unless all of it is replaced all at once, will remaincharacter, so the results ofas.numericare lost. It is definitely possible to convert the:-containing strings withtime2numin-place, but they will be converted to strings, so you'll end up with:This generally comes up with the same result, but
time2numconverts to a floating-pointnumeric, and then replacing it into subsets ofdatos_textoresults in it being converted to string representations of the floating-point numbers. This is easily converted again asbut converting to number then string then number is inefficient (and R is relatively inefficient with large amounts of strings, google
R global string pool, visit Object size for characters in R - How does R global string pool work? and https://adv-r.hadley.nz/names-values.html, and put your learning-cap on). This also works, but I recommend and prefer the use of anumeric-vector for this.