Spread() error in anova_test(): how to make keys unique?

106 views Asked by At

I'm trying to run a 2-way repeated measures ANOVA to look at condition and time effect on systolic BP. I'm using the anova_test() function but I'm getting the error:

Error in `spread()`:
! Each row of output must be identified by a unique combination of keys.
ℹ Keys are shared for 6 rows
• 14, 36
• 182, 204
• 98, 120

I'm unsure of why these are reading as non-unique?

> df_lbp[c(14,36),]
# A tibble: 2 × 10
  subject_id condition visit  time  syst  timef      conditionf
       <dbl>     <dbl> <dbl> <int> <dbl>  <fct>      <fct>     
1        129         0     1     1  106.  anticipate Control   
2        165         1     1     1  119   anticipate Stress    
> df_lbp[c(182, 204),]
# A tibble: 2 × 10
  subject_id condition visit  time  syst  timef    conditionf
       <dbl>     <dbl> <dbl> <int> <dbl>  <fct>    <fct>     
1        129         0     1     3  103.  recovery Control   
2        165         1     1     3  121.  recovery Stress    
> df_lbp[c(98, 120),]
# A tibble: 2 × 10
  subject_id condition visit  time  syst  timef conditionf
       <dbl>     <dbl> <dbl> <int> <dbl>  <fct> <fct>     
1        129         0     1     2  102.  task  Control   
2        165         1     1     2  128   task  Stress 

I'm curious what r is pulling from to use as keys, and I'd appreciate any help in getting this to work. My code and data are below.


a1 <- anova_test( data = df_lbp, dv = syst,
                  wid = subject_id, 
                  within = c(timef, conditionf) )

get_anova_table(a1)

 dput(df_lbp))

1

There are 1 answers

4
I_O On BEST ANSWER

You observation with subject_id 161 has several entries (varying timef values):

library(dplyr)

df_lbp |>
  count(subject_id, timef, conditionf) |>
  filter(n > 1)

output:

# A tibble: 3 x 4
  subject_id timef      conditionf     n
       <dbl> <fct>      <fct>      <int>
1        161 anticipate Stress         2
2        161 task       Stress         2
3        161 recovery   Stress         2

... without these duplicates, anova_test runs OK:

df_lbp |>
  filter(subject_id != 161) |>
  rstatix::anova_test(dv = syst,
                      wid = subject_id, 
                      within = c(timef, conditionf)
                      )

output:

+ ANOVA Table (type III tests)

$ANOVA
            Effect DFn DFd      F        p p<.05   ges
1            timef   2  46 38.931 1.28e-10     * 0.076
2       conditionf   1  23 28.937 1.83e-05     * 0.284
3 timef:conditionf   2  46 37.078 2.57e-10     * 0.087
## etc.

edit as r2evans pointed out, you can keep distinct combinations of variables (instead of checking first and singling them out) like so (note that the first/topmost observation of any duplicate is kept):

df_lbp |>
  distinct(subject_id, timef, conditionf,
           .keep_all = TRUE
           ) |>
  rstatix::anova_test(dv = syst,
                      wid = subject_id, 
                      within = c(timef, conditionf)
                      )