Using Aggregate and Cbind- NA issue?

Question

Using Aggregate and Cbind- NA issue?

958 views Asked by ichordia91 At 12 November 2014 at 11:54

I am using cbind to find the mean of 3 different columns. However I get different answers for the means when I do:

  DFNEW <- aggregate(cbind(X1, X2, X3)~Y, DF, FUN=mean)

vs

  DFNEW <- aggregate(cbind(X1, X2)~Y, DF, FUN=mean)

The means of X1 and X2 are different when I run command 1 and when I run command 2. X1, X2, and X3 all have different numbers of NA arguments- is that the reason? Part of this may also be that I'm not entirely sure what cbind is doing in this case.

Original Q&A

There are 3 answers

rnso On 12 November 2014 at 12:56

One can also use split command:

> sapply(split(DF, DF$Y), function(x) sapply(x[1:3], mean, na.rm=T))
       A    B        C
X1 3.625 3.00 4.142857
X2 6.375 6.75 7.166667
X3 2.875 1.50 1.333333

(Using @akrun's data)

KFB On 12 November 2014 at 12:38

The beauty of open source is diversity and array of tools available. Quite a fun. Here is a data.table alternative for your reference. Use @akrun's data.

library(data.table)
setDT(DF)[, lapply(.SD, mean, na.rm=T), by=Y]
#    Y       X1       X2       X3
# 1: C 4.142857 7.166667 1.333333
# 2: A 3.625000 6.375000 2.875000
# 3: B 3.000000 6.750000 1.500000

setDT(DF)[, lapply(.SD, mean, na.rm=T), by=Y, .SDcols=1:2]
#   Y       X1       X2
# 1: C 4.142857 7.166667
# 2: A 3.625000 6.375000
# 3: B 3.000000 6.750000

**akrun** · Accepted Answer · 2014-11-12T12:05:32+00:00

I guess the reason why you are getting different result is because by default, na.action=na.omit for the formula interface. So, the rows with NAs are omitted and not used for the calculation of mean. When we use different combinations of columns, different rows could be deleted based on the occurrence of NA. By specifying na.action=NULL, the rows will not get deleted and we can remove the NA values while calculating the mean by using the argument na.rm=TRUE in the mean function.

  aggregate(cbind(X1, X2)~Y, DF, FUN=mean, na.rm=TRUE, na.action=NULL)
   #Y       X1       X2
   #1 A 3.625000 6.375000
   #2 B 3.000000 6.750000
   #3 C 4.142857 7.166667

  aggregate(cbind(X1, X2, X3)~Y, DF, FUN=mean, na.rm=TRUE, na.action=NULL)
  #  Y       X1       X2       X3
  #1 A 3.625000 6.375000 2.875000
  #2 B 3.000000 6.750000 1.500000
  #3 C 4.142857 7.166667 1.333333

The results we get from the above will be the same below i.e. without using the formula interface

aggregate(DF[,c("X1", "X2")], list(Y=DF[,"Y"]), FUN= mean, na.rm=TRUE)
#  Y       X1       X2
# 1 A 3.625000 6.375000
# 2 B 3.000000 6.750000
# 3 C 4.142857 7.166667


aggregate(DF[,c("X1", "X2", "X3")], list(Y=DF[,"Y"]), FUN= mean, na.rm=TRUE)
#  Y       X1       X2       X3
#1 A 3.625000 6.375000 2.875000
#2 B 3.000000 6.750000 1.500000
#3 C 4.142857 7.166667 1.333333

If you want some alternatives, you could use dplyr

 library(dplyr)
  DF %>% 
     group_by(Y) %>% 
     summarise_each(funs(mean=mean(., na.rm=TRUE)))
  # Source: local data frame [3 x 4]

  #  Y       X1       X2       X3
  #1 A 3.625000 6.375000 2.875000
  #2 B 3.000000 6.750000 1.500000
  #3 C 4.142857 7.166667 1.333333

   DF %>% 
      group_by(Y) %>%
      summarise_each(funs(mean=mean(., na.rm=TRUE)), X1, X2)
   #Source: local data frame [3 x 3]

   #  Y       X1       X2
   #1 A 3.625000 6.375000
   #2 B 3.000000 6.750000
   #3 C 4.142857 7.166667

data

set.seed(42)
DF <- data.frame(X1=sample(c(NA, 1:5), 25, replace=TRUE), 
    X2= sample(c(NA, 1:10), 25, replace=TRUE), X3= sample(c(NA,0:5), 25, 
    replace=TRUE), Y=sample(LETTERS[1:3], 25, replace=TRUE))

TechQA.

Using Aggregate and Cbind- NA issue?

There are 3 answers

data

Related Questions in R

Related Questions in AGGREGATE

Related Questions in NA

Related Questions in CBIND

Popular Questions

Popular Tags

Trending Questions