R script. I want to do the following.
The dataset is grouped by pid, med and date1.
If pid or med changes then the hours_output == 255, otherwise the hours_output == the time interval in hours.
If the date1 alters, then the last datetime on that day == 255, otherwise the hours_output == the time interval in hours.
This is the dummy dataset.
df <- data.frame(
pid = c(rep(1, 3), rep(2, 3), rep(3, 3), rep(4, 3)),
med = c(rep("drugA", 4), rep("drugB", 4), rep("drugC", 4)),
date1 = c("2023-02-01 09:00:00", "2023-02-01 12:00:00", "2023-02-01 14:00:00",
"2023-02-02 10:00:00", "2023-02-02 18:00:00", "2023-02-03 11:00:00",
"2023-02-04 09:00:00", "2023-02-04 12:00:00", "2023-02-05 10:00:00",
"2023-02-06 08:00:00", "2023-02-06 12:00:00", "2023-02-06 14:00:00")
)
Desired output.
pid med date1 pid_change med_change date1_change hours_output
1 drugA 2023-02-01 09:00:00 0 0 0 3
1 drugA 2023-02-01 12:00:00 0 0 1 2
1 drugA 2023-02-01 14:00:00 0 0 1 255
2 drugA 2023-02-02 10:00:00 1 0 1 255
2 drugB 2023-02-02 18:00:00 0 1 1 255
2 drugB 2023-02-03 11:00:00 0 0 1 255
3 drugB 2023-02-04 09:00:00 1 0 1 255
3 drugB 2023-02-04 12:00:00 0 0 1 255
3 drugC 2023-02-05 10:00:00 0 1 1 255
4 drugC 2023-02-06 08:00:00 1 0 1 255
4 drugC 2023-02-06 12:00:00 0 0 1 255
4 drugC 2023-02-06 14:00:00 0 0 1 255
This is the sample script tried.
# Convert date1 to a POSIXct format
df$date1 <- as.POSIXct(df$date1)
# Add a column to track changes in pid, med, and date1
df <- df %>% mutate(pid_change = ifelse(pid != lag(pid, default = first(pid)), 1, 0),
med_change = ifelse(med != lag(med, default = first(med)), 1, 0),
date1_change = ifelse(date1 != lag(date1, default = first(date1)), 1, 0))
# Calculate the hours_output column based on changes in pid, med, and date1
df <- df %>%
group_by(pid, med) %>%
mutate(
hours_output = ifelse(pid_change | med_change | date1_change, 255, as.numeric(c(diff(date1), 24), units = "hours"))
) %>%
ungroup() %>%
mutate(
hours_output = ifelse(is.na(hours_output), -999, hours_output),
hours_output = abs(hours_output)
)
df
returns the following.
pid med date1 pid_change med_change date1_change hours_output
1 drugA 2023-02-01 09:00:00 0 0 0 3
1 drugA 2023-02-01 12:00:00 0 0 1 255
1 drugA 2023-02-01 14:00:00 0 0 1 255
2 drugA 2023-02-02 10:00:00 1 0 1 255
2 drugB 2023-02-02 18:00:00 0 1 1 255
2 drugB 2023-02-03 11:00:00 0 0 1 255
3 drugB 2023-02-04 09:00:00 1 0 1 255
3 drugB 2023-02-04 12:00:00 0 0 1 255
3 drugC 2023-02-05 10:00:00 0 1 1 255
4 drugC 2023-02-06 08:00:00 1 0 1 255
4 drugC 2023-02-06 12:00:00 0 0 1 255
4 drugC 2023-02-06 14:00:00 0 0 1 255