Using r-studio: I have a dataset with about 500 individuals (id), about 60000 rows with different variables all with a date, and there are a lot of missing data.
I've been able to intrapolate weight, so that any row with a missing value for weight on a date between two registered weights, gets a linearly estimated weight.
I now want to try extrapolate(or fill, using the same value) weight data before and after the first and last weight measurement (since it does not change very quickly). For example with the below data - for any row that's within 7 months before or after a measured weight, I want to copy that weight. That would take 88 and put in the 3 previous rows, but leaving 5 still NA, and 92.6 and put it in the 2 following rows, but leaving 3 still NA.
id date weight_interpolated
<dbl> <date> <dbl>
1 2009-01-30 NA
1 2009-01-30 NA
1 2009-02-04 NA
1 2009-02-05 NA
1 2009-02-05 NA
1 2009-03-18 NA
1 2009-07-09 NA
1 2009-07-09 NA
1 2009-09-18 88
1 2009-09-19 88.0
1 2018-12-19 92.7
1 2019-03-13 92.6
1 2019-03-13 92.6
1 2019-03-19 92.6
1 2019-10-03 NA
1 2019-10-03 NA
1 2019-10-22 NA
1 2019-10-22 NA
1 2020-03-26 NA
I haven't found a way to do this. I have been searching for a similar question and trying with chatGPT different suggestions, and also trying to create a row for every possible date for all 500 individuals (expanding the DF from 60'000 rows to 5'000'000 rows) with the idea of just using the fill function but limiting it to just filling a certain number of rows in each direction (e.g. 7 months then 30*7 rows, and then deleting all rows that only contain this fill data and nothing else in order just to keep the rows that existed before expanding the df.) The solutions I got from chatGPT that I was able to run only filled in one row before and after and not as I would want in the above example 3+2 rows.
Anyone knows any way to do this?
I've tried different if loops and case_when arguments, also as mentioned creating rows for every date to make it simpler but I did not know my way forward then...