Want to get the dataframe of values that are deviations from the mean based on a factor column

76 views Asked by At

Example:

So lets say I have this data frame.

x = data.frame(factor = as.factor(c('a','a','b','b','c','c')),value1 = c(1,3,2,4,5,3), value2 = c(7,9,3,4,9,3))


    factor value1 value2
1      a      1      7
2      a      3      9
3      b      2      3
4      b      4      4
5      c      5      9
6      c      3      3

I know how to get the mean per factor, I use this method:

aggregate(x[,c(2,3)], list(x$factor), mean, na.rm = T )

This give me the following output:

  Group.1 value1 value2
1       a      2    8.0
2       b      3    3.5
3       c      4    6.0

How do I now go about subtracting from each value in the original dataframe the corresponding mean of its factor. The actual dataset I am using is big so need to have a nice way, I have managed to do it but I used complicated for loops.

So the output that I want would be:

  factor value1 value2
1      a     -1   -1.0
2      a      1    1.0
3      b     -1   -0.5
4      b      1    0.5
5      c      1    3.0
6      c     -1   -3.0

Any help would be great. Thanks.

3

There are 3 answers

8
ekoam On BEST ANSWER

A dplyr solution

library(dplyr)
x %>% group_by(factor) %>% mutate(across(c(value1, value2), ~. - mean(.)))

Output

# A tibble: 6 x 3
# Groups:   factor [3]
  factor value1 value2
  <fct>   <dbl>  <dbl>
1 a          -1   -1  
2 a           1    1  
3 b          -1   -0.5
4 b           1    0.5
5 c           1    3  
6 c          -1   -3  
0
Duck On

You can try this dplyr approach:

library(dplyr)
#Data
x = data.frame(factor = as.factor(c('a','a','b','b','c','c')),value1 = c(1,3,2,4,5,3), value2 = c(7,9,3,4,9,3))
#Code
x <- x %>% group_by(factor) %>%
  mutate(Mv1=mean(value1),
         Mv2=mean(value2),
         value1=value1-Mv1,
         value2=value2-Mv2) %>% select(-c(Mv1,Mv2))

Output:

# A tibble: 6 x 3
# Groups:   factor [3]
  factor value1 value2
  <fct>   <dbl>  <dbl>
1 a          -1   -1  
2 a           1    1  
3 b          -1   -0.5
4 b           1    0.5
5 c           1    3  
6 c          -1   -3  
0
jogo On

Here is a solution with data.table

library("data.table")
setDT(x)
cols <- paste0("value", 1:2)
x[, lapply(.SD, function(x) x - mean(x)), .SDcols=cols, by=factor]

or

library("data.table")
setDT(x)
x[, sweep(.SD, 2, STATS=colMeans(.SD)), by=factor, .SDcols=2:3]