I am using cbind to find the mean of 3 different columns. However I get different answers for the means when I do:
DFNEW <- aggregate(cbind(X1, X2, X3)~Y, DF, FUN=mean)
vs
DFNEW <- aggregate(cbind(X1, X2)~Y, DF, FUN=mean)
The means of X1 and X2 are different when I run command 1 and when I run command 2. X1, X2, and X3 all have different numbers of NA arguments- is that the reason? Part of this may also be that I'm not entirely sure what cbind is doing in this case.
I guess the reason why you are getting different result is because by default,
na.action=na.omitfor theformulainterface. So, the rows withNAs are omitted and not used for the calculation ofmean. When we use different combinations of columns, different rows could be deleted based on the occurrence ofNA. By specifyingna.action=NULL, the rows will not get deleted and we can remove theNAvalues while calculating themeanby using the argumentna.rm=TRUEin themeanfunction.The results we get from the above will be the same below i.e. without using the
formulainterfaceIf you want some alternatives, you could use
dplyrdata