Inserting Missing Values into data frame using dplyr in R and have other columns populate

26 views Asked by At

I feel like I am on the cusp, but cant quite get there!

I have a dataframe where one column are doubles. I want to expand these values to include all values in between at 0.001 values. For example if this is my input,

values = c(0.086,0.092, 0.099, 0.080,0.092, 1.02) sess = c(1, 1,1, 2, 2,2) phase = c(1,2,3,2,1,5) df = data.frame(values, sess, phase)

I want the output to look like this, where there are values in between the min and max value for each session is represented to the 0.001 level, and the session column is populated with the appropriate value at the same time.

output sess phase
 0.086     1 1
 0.087     1 
 0.088     1 
 0.089     1 
 0.090     1 
 0.091     1 
 0.092     1 2
 0.093     1 
 0.094     1
 0.095     1
 0.096     1
 0.097     1
 0.098     1
 0.099     1 3
 0.080     2 2
 0.081     2
 0.082     2
 0.083     2
 0.084     2
 0.085     2
 0.086     2
 0.087     2
 0.088     2
0.089     2
0.090     2
0.091     2
0.092     2 1
0.093     2
0.094     2
0.095     2
0.096     2
0.097     2
0.098     2
0.099     2
0.100     2
0.101     2
0.102     2 5
0.103     2
0.104     2

I can use 'seq' to create this column, but adding it to the existing dataframe is causing my trouble. I feel like a 'join' might be appropriate, but I can't quite get it...

1

There are 1 answers

1
Allan Cameron On

If you group_by each sess, then you can reframe so that your output column is just the seq between the minimum and maximum of the values in each group.

library(dplyr)

df %>% 
  group_by(sess) %>% 
  reframe(output = seq(min(values), max(values), 0.001)) %>%
  select(2:1)
#> # A tibble: 955 x 2
#>    output  sess
#>     <dbl> <dbl>
#>  1  0.086     1
#>  2  0.087     1
#>  3  0.088     1
#>  4  0.089     1
#>  5  0.09      1
#>  6  0.091     1
#>  7  0.092     1
#>  8  0.093     1
#>  9  0.094     1
#> 10  0.095     1
#> # i 945 more rows
#> # i Use `print(n = ...)` to see more rows