I have data that look like this, with the dates of Birth and Death for patients.
ID DOB Birth_Year DOD Death_Year
1 2016-10-01 2016 2019-10-15 2019
2 2017-07-01 2019 2022-01-10 2022
3 2017-04-35 2017 2020-08-15 2020
And I need to create follow up time for each year from 2015 to 2020. If the follow-up time for the birth year is the difference between the DOB and the last date of the Birth Year. Similarly, the follow up time for the death year is the time between the first date of the death year to the date of death. Otherwise, the follow-up time is zero or one year. I am expecting an output dataframe with new columns Year_2015 to Year 2020 as below.
ID DOB Birth_Year DOD Death_Year Year_2015 Year_2016 Year_2017 Year_2018 Year_2019 Year_2020
1 2016-10-01 2016 2019-06-30 2019 0 0.25 1 1 0.5 0
2 2017-07-01 2019 2022-01-10 2022 0 0 0.5 1 1 1
3 2017-04-15 2017 2020-08-15 2020 0 0 0.3 1 1 0.8
I tried to used case_when in dplyr package, or loop function with if else statements. First I successfully created columns with same prefix Year_2015 to Year_2020, but failed to write loops that iterates through the columns based on multiple conditions. I had trouble referencing varying column names within loops. Or maybe there is a way to use apply function in R. Any help is appreciated!
for (i in 2015:2020) {
FoUp_Year <- paste0('Year_', i)
df[, Birth_Year_end] <- make_date(year = Birth_Year, month = 12, day = 31)
df[, Death_Year_start] <- make_date(year = Death_Year, month = 1, day = 1)
if (i<df[, Birth_Year] | i>df[, Death_Year]) {
df[,FoUp_Year] <- 0
}
else if(i==df[, Birth_Year] && i<df[, Death_Year]) {
df[,FoUp_Year] <- df[, Birth_Year_end]-df[,DOB]
}
else if(i==df[, Death_Year] && i>df[, Birth_Year]) {
df[,FoUp_Year] <- df[, DOD]-df[,Death_Year_start]
}
else if(i==df[, Birth_Year] && i==df[, Death_Year]) {
df[,FoUp_Year] <- df[, DOD]-df[,DOB]
}
else if(i>df[, Birth_Year] && i<df[, Death_Year]) {
df[,FoUp_Year] <- 1
}
}
The package
lubridatewill calculate the time difference between dates. Then the number of months (or days if you want more precision) between the dates can be divided by the total number of months in a year (or days, also taking into account leap years).Precision at the month level:
Precision at the day level:
Raw data