I want to use zoo::na.approx
(but not married to this function) to fill in a response variable for the missing days in my dataframe. I'm having a tough time figuring out how to add the NAs to the original dataframe so that na.approx can fill them in.
My dataframe looks something like this:
df<-data.frame(trt=c("A", "A", "A", "A", "B", "B", "B", "B"),
day = c(1,3,7,9,1,5,8,9),
value = c(7,12,5,7,5,6,11,8),
stringsAsFactors=FALSE)
I want every day to be in the dataframe with "NA" for each day where I don't have data.
I have used something like this to expand my dataset:
library(dplyr)
days_possible <- expand.grid(
day = seq(from=min(df$day), max(df$day), by=1),
trt = c("A", "B"),
stringsAsFactors = FALSE
)
new_df<- df %>%
right_join(days_possible, by = c("trt", "day"))
My problem is that I have a bunch of sites, years, and a few treatment columns, so somewhere it seems to all get messed up and in my days_possible
dataframe, I don't get it right.
Is there a function to avoid this mess, expand one column, and have all the other columns expand in a tidy fashion? I'm looking at modelr::data_grid
, but by itself I am not sure how to get the final desired result - an ordered dataframe I can group by treatments and use approximation to fill in missing days.
We can use the
complete
andfull_seq
functions from thetidyr
package. The finalas.data.frame()
is not required. I just added it to print the output as a data frame.