Inserting rows to reflect missing data

49 views Asked by At

I am working on a function that outputs a data frame that currently omits trials where there is missing data. However, I would like the full trial count to be added back into the file and the other data columns be blank for these instances (reflecting the missing data).

Example Data Frames:

Df1withTrialCount <- data.frame(Participant = c('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A' ), 
Trial = c(1,1,2,2,3,3,4,5,6,7,8,9,10,10,10),
NotRelevantVariable = c(1,2,3,4,5,6,4,3,2,1,1,2,3,4,5))

Df2NeedsTrialsAddedIn <- data.frame(Participant = c('A', 'A', 'A', 'A', 'A'), 
Trial = c(1,3,5,6,10),
EyeGaze = c(.4, .2., .2, .1, .1))

So I would end up with something that had one row each for Trials 1-10 but blanks in Eye Gaze when there was not data (e.g., Trial 2 would have a blank for EyeGaze but Trial 3 would have .2).

Any help or insights would be greatly appreciated.

Take care and thank you for your time, Caroline

3

There are 3 answers

0
Carlos Eduardo Lagosta On BEST ANSWER

With base::merge:

merge(unique(Df1withTrialCount[, c("Participant", "Trial")]),
      Df2NeedsTrialsAddedIn,
      all.x = TRUE)
1
drunkfish69 On
library(tidyverse)

Df1withTrialCount %>% 
  left_join(Df2NeedsTrialsAddedIn, by=c('Participant', 'Trial')) %>% 
  distinct(Trial, .keep_all = TRUE)
2
akrun On

We can use complete

library(tidyr)
complete(Df2NeedsTrialsAddedIn, Participant, 
        Trial = seq_len(max(Df1withTrialCount$Trial)))

-output

# A tibble: 10 x 3
#   Participant Trial EyeGaze
#   <chr>       <dbl>   <dbl>
# 1 A               1     0.4
# 2 A               2    NA  
# 3 A               3     0.2
# 4 A               4    NA  
# 5 A               5     0.2
# 6 A               6     0.1
# 7 A               7    NA  
# 8 A               8    NA  
# 9 A               9    NA  
#10 A              10     0.1

If we need both min and `max from first dataset

complete(Df2NeedsTrialsAddedIn, Participant, 
        Trial = seq(min(Df1withTrialCount$Trial), max(Df1withTrialCount$Trial), by = 1))