Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

Question

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

156.1k views Asked by Amer At 14 November 2014 at 06:00

I have a data frame that looks like this:

#df
ID  DRUG FED  AUC0t  Tmax   Cmax
1    1     0   100     5      20
2    1     1   200     6      25
3    0     1   NA      2      30 
4    0     0   150     6      65

Ans so on. I want to summarize some statistics on AUC, Tmax and Cmax by drug DRUG and FED STATUS FED. I use dplyr. For example: for the AUC:

CI90lo <- function(x) quantile(x, probs=0.05, na.rm=TRUE)
CI90hi <- function(x) quantile(x, probs=0.95, na.rm=TRUE)  

summary <- df %>%
             group_by(DRUG,FED) %>%
             summarize(mean=mean(AUC0t, na.rm=TRUE), 
                                 low = CI90lo(AUC0t), 
                                 high= CI90hi(AUC0t),
                                 min=min(AUC0t, na.rm=TRUE),
                                 max=max(AUC0t,na.rm=TRUE), 
                                 sd= sd(AUC0t, na.rm=TRUE))

However, the output is not grouped by DRUG and FED. It gives only one line containing the statistics of all by not faceted on DRUG and FED.

Any idea why? and how can I make it do the right thing?

Original Q&A

There are 5 answers

KFB On 14 November 2014 at 06:49

Or you could consider using data.table

library(data.table)
setDT(df)  # set the data frame as data table
df[, list(mean = mean(AUC0t, na.rm=TRUE),
          low = CI90lo(AUC0t), 
          high = CI90hi(AUC0t), 
          min = as.double(min(AUC0t, na.rm=TRUE)),
          max = as.double(max(AUC0t, na.rm=TRUE)), 
          sd = sd(AUC0t, na.rm=TRUE)),
   by=list(DRUG, FED)]

#    DRUG FED mean low high min  max sd
# 1:    1   0  100 100  100 100  100 NA
# 2:    1   1  200 200  200 200  200 NA
# 3:    0   1  NaN  NA   NA Inf -Inf NA
# 4:    0   0  150 150  150 150  150 NA
# Warning messages:
#   1: In min(AUC0t, na.rm = TRUE) :
#   no non-missing arguments to min; returning Inf
# 2: In max(AUC0t, na.rm = TRUE) :
#   no non-missing arguments to max; returning -Inf

mmann1123 On 02 February 2018 at 18:35

A variant of aosmith's answer that might help some folks out. Direct R to call dplyr's functions directly. Good trick when one package interferes with another.

df %>%
      dplyr::group_by(DRUG,FED) %>%
      dplyr::summarize(mean=mean(AUC0t, na.rm=TRUE), 
                low = CI90lo(AUC0t), 
                 high= CI90hi(AUC0t),
                 min=min(AUC0t, na.rm=TRUE),
                 max=max(AUC0t,na.rm=TRUE), 
                 sd= sd(AUC0t, na.rm=TRUE))

Tushar Lad On 21 August 2019 at 07:47

Try sqldf is best way and easy to learn for grouping the data. Below is example to your need.all kinds of data sample grouping sqldf library is very helpful.

install.packages("sqldf")
library(sqldf)
dat1 <- sqldf("select x,y,
            y/sum(y) as Z
            from dat
            group by x")

Elainfl On 20 April 2021 at 14:11

In addition to dplyr, users often use ggplot and with it ggpubr functions. It is in fact, another common used package that has a few incompatibilities with dplyr. In the same way, as shown above you can use dplyr::package, but if it keeps not working, as it happened to me, just detaching the library it will be enough,

detach("package:ggpubr", unload = TRUE)

df %>%
  dplyr::group_by(DRUG,FED) %>%
  dplyr::summarize(mean=mean(AUC0t, na.rm=TRUE), 
            low = CI90lo(AUC0t), 
             high= CI90hi(AUC0t),
             min=min(AUC0t, na.rm=TRUE),
             max=max(AUC0t,na.rm=TRUE), 
             sd= sd(AUC0t, na.rm=TRUE))

**aosmith** · Accepted Answer · 2014-11-14T15:15:39+00:00

I believe you've loaded plyr after dplyr, which is why you are getting an overall summary instead of a grouped summary.

This is what happens with plyr loaded last.

library(dplyr)
library(plyr)
df %>%
      group_by(DRUG,FED) %>%
      summarize(mean=mean(AUC0t, na.rm=TRUE), 
                low = CI90lo(AUC0t), 
                 high= CI90hi(AUC0t),
                 min=min(AUC0t, na.rm=TRUE),
                 max=max(AUC0t,na.rm=TRUE), 
                 sd= sd(AUC0t, na.rm=TRUE))

  mean low high min max sd
1  150 105  195 100 200 50

Now remove plyr and try again and you get the grouped summary.

detach(package:plyr)
df %>%
      group_by(DRUG,FED) %>%
      summarize(mean=mean(AUC0t, na.rm=TRUE), 
                low = CI90lo(AUC0t), 
                 high= CI90hi(AUC0t),
                 min=min(AUC0t, na.rm=TRUE),
                 max=max(AUC0t,na.rm=TRUE), 
                 sd= sd(AUC0t, na.rm=TRUE))

Source: local data frame [4 x 8]
Groups: DRUG

  DRUG FED mean low high min max  sd
1    0   0  150 150  150 150 150 NaN
2    0   1  NaN  NA   NA  NA  NA NaN
3    1   0  100 100  100 100 100 NaN
4    1   1  200 200  200 200 200 NaN

TechQA.

Why are my dplyr group_by & summarize not working properly? (name-collision with plyr)

There are 5 answers

Related Questions in R

Related Questions in PLYR

Related Questions in DPLYR

Related Questions in SHADOWING

Related Questions in NAME-COLLISION

Popular Questions

Trending Questions