I am using cbind
to find the mean of 3 different columns. However I get different answers for the means when I do:
DFNEW <- aggregate(cbind(X1, X2, X3)~Y, DF, FUN=mean)
vs
DFNEW <- aggregate(cbind(X1, X2)~Y, DF, FUN=mean)
The means of X1 and X2 are different when I run command 1 and when I run command 2. X1, X2, and X3 all have different numbers of NA arguments- is that the reason? Part of this may also be that I'm not entirely sure what cbind
is doing in this case.
I guess the reason why you are getting different result is because by default,
na.action=na.omit
for theformula
interface. So, the rows withNA
s are omitted and not used for the calculation ofmean
. When we use different combinations of columns, different rows could be deleted based on the occurrence ofNA
. By specifyingna.action=NULL
, the rows will not get deleted and we can remove theNA
values while calculating themean
by using the argumentna.rm=TRUE
in themean
function.The results we get from the above will be the same below i.e. without using the
formula
interfaceIf you want some alternatives, you could use
dplyr
data