Row-wise summation for grouped_df data type

Question

Row-wise summation for grouped_df data type

407 views Asked by watchtower At 07 January 2017 at 07:38

I am a beginner and not too familiar with advanced features of R. I am unable to understand why reduce() doesn't work for grouped_df. I am building upon my discussion at Rowwise summation for Tibble datatype where I posted reduce() as one of the solutions when the class of datatype is:

"tbl_df"     "tbl"        "data.frame"

Here's the sample data:

  df <- data.frame(client = rep(c("Client A","Client B", "Client C"),3), 
                       year = rep(c(2014,2013,2012), each=3), 
                       rev1 = rep(c(10,20,30),3),
                       rev2 = rep(c(10,20,30),3))

where, class (df) is "tbl_df" "tbl" "data.frame"

I'd now convert df to of class grouped_df by :

df1 <- df %>% 
        group_by(client, year,rev1) %>%
        summarise(rev3 = sum(rev1,rev2)) %>%
        select(client, year, rev3, rev1)

where, class (df1) is "grouped_df" "tbl_df" "tbl" "data.frame", which is as expected.

Now, when I use reduce() to do row-wise summation on df1, it throws an error.

df1%>% dplyr::mutate(sum=Reduce("+",.[3:4]))
Error: incompatible size (9), expecting 1 (the group size) or 1

However, when I convert df1 to data frame, it works well.

df1%>% dplyr::as_data_frame() %>%  dplyr::mutate(sum=Reduce("+",.[3:4]))

The head() of above output is:

# A tibble: 6 × 5
    client  year  rev3  rev1   sum
    <fctr> <dbl> <dbl> <dbl> <dbl>
1 Client A  2012    20    10    30
2 Client A  2013    20    10    30
3 Client A  2014    20    10    30
4 Client B  2012    40    20    60
5 Client B  2013    40    20    60
6 Client B  2014    40    20    60
...

Can someone please explain why reduce() function doesn't work for grouped data, but works for non-grouped data? Maybe, I am missing something here.

Original Q&A

There are 2 answers

**leerssej** · Answer 1 · 2017-01-07T08:12:48+00:00

Reduce() and replace() work on vectors.

The df1 grouped dataframe becomes much more than a collection of vectors. Below is what it looks like if you flip open the objects (found in the environment pane.)

If we add an ungroup() we can get a collection of vectors back.

df2 <- df %>% 
    group_by(client, year,rev1) %>%
    summarise(rev3 = sum(rev1,rev2)) %>%
    select(client, year, rev3, rev1) %>% 
    ungroup %>% 
    mutate(sum=Reduce("+",.[3:4]))

In any case, could maybe this dplyr code work instead?

mutate(df, rev3 = rev1 + rev2, sum = 2*rev1 + rev2)

**conrad-mac** · Answer 2 · 2017-01-07T07:59:19+00:00

conrad-mac On 07 January 2017 at 07:59

You're not using the replace() function in any of your code blocks above. You're using the Reduce() function.

As an aside, df() is a density distribution function in the stats package - it's bad practice to assign objects to functions.

TechQA.

Row-wise summation for grouped_df data type

There are 2 answers

Related Questions in R

Related Questions in DPLYR

Related Questions in PURRR

Popular Questions

Popular Tags

Trending Questions