Row-wise summation for grouped_df data type

424 views Asked by At

I am a beginner and not too familiar with advanced features of R. I am unable to understand why reduce() doesn't work for grouped_df. I am building upon my discussion at Rowwise summation for Tibble datatype where I posted reduce() as one of the solutions when the class of datatype is:

"tbl_df"     "tbl"        "data.frame"

Here's the sample data:

  df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3), 
                       year = rep(c(2014,2013,2012), each=3), 
                       rev1 = rep(c(10,20,30),3),
                       rev2 = rep(c(10,20,30),3))

where, class (df) is "tbl_df" "tbl" "data.frame"

I'd now convert df to of class grouped_df by :

df1 <- df %>% 
        group_by(client, year,rev1) %>%
        summarise(rev3 = sum(rev1,rev2)) %>%
        select(client, year, rev3, rev1)

where, class (df1) is "grouped_df" "tbl_df" "tbl" "data.frame", which is as expected.

Now, when I use reduce() to do row-wise summation on df1, it throws an error.

df1%>% dplyr::mutate(sum=Reduce("+",.[3:4]))
Error: incompatible size (9), expecting 1 (the group size) or 1

However, when I convert df1 to data frame, it works well.

df1%>% dplyr::as_data_frame() %>%  dplyr::mutate(sum=Reduce("+",.[3:4]))

The head() of above output is:

# A tibble: 6 × 5
    client  year  rev3  rev1   sum
    <fctr> <dbl> <dbl> <dbl> <dbl>
1 Client A  2012    20    10    30
2 Client A  2013    20    10    30
3 Client A  2014    20    10    30
4 Client B  2012    40    20    60
5 Client B  2013    40    20    60
6 Client B  2014    40    20    60
...

Can someone please explain why reduce() function doesn't work for grouped data, but works for non-grouped data? Maybe, I am missing something here.

2

There are 2 answers

5
leerssej On

Reduce() and replace() work on vectors.

The df1 grouped dataframe becomes much more than a collection of vectors. Below is what it looks like if you flip open the objects (found in the environment pane.) df and df1 under the hood

If we add an ungroup() we can get a collection of vectors back.

df2 <- df %>% 
    group_by(client, year,rev1) %>%
    summarise(rev3 = sum(rev1,rev2)) %>%
    select(client, year, rev3, rev1) %>% 
    ungroup %>% 
    mutate(sum=Reduce("+",.[3:4]))

In any case, could maybe this dplyr code work instead?

mutate(df, rev3 = rev1 + rev2, sum = 2*rev1 + rev2)
1
conrad-mac On

You're not using the replace() function in any of your code blocks above. You're using the Reduce() function.

As an aside, df() is a density distribution function in the stats package - it's bad practice to assign objects to functions.