data.table: Keep original column name when applying a function inside a 'by=variable' statement

59 views Asked by At

When I want to perform a function over one or multiple columns inside a data.table while modifying the columns I group by with a function in the same call, the resulting data.table always shows the applied function as the new column name.

Code example:

library(data.table)

dt <- data.table(value=rnorm(100), class=sample(1:3, 100, replace=TRUE))

dt[, .(class_mean=mean(value)), by=factor(class)]

Output:

   factor   class_mean
1:      2  0.007297291
2:      3 -0.122847460
3:      1  0.103293676

What I originally would expect is that I get the original column name in the result, like this:

   class   class_mean
1:      2  0.007297291
2:      3 -0.122847460
3:      1  0.103293676

As far as I can judge, this is happening regardless of which function is applied to the grouping column(s). When performing grouped modifications on a data.table with a column name stored in another variable I usually use by=get(variable_that_stores_the_column_name), also resulting in the modified data.table showing get as the new column name.

How can I modify my data.table grouping call to get the result I want without tediously renaming the column names of the result again?

EDIT:

Thanks for the responses and answers in the comments. This works for the most cases. However, if I would like to address the grouping variable by name via another variable (and want to keep that variable name in the result), the same problem arises:

var_name <- "class"
dt[, .(class_mean=mean(value)), by=.(var_name = factor(get(var_name)))]

names the resulting column var_name. And

var_name <- "class"
dt[, .(class_mean=mean(value)), by=.(get(var_name) = factor(get(var_name)))]

leads to an error:

Error: unexpected '=' in "dt[, .(class_mean=mean(value)), by=.(get(var_name) ="
1

There are 1 answers

0
Wimpel On BEST ANSWER

for your edited question,

my_name <- "class"
dt[, .(class_mean=mean(value)), by=.(var_name = factor(var_name)), env = list(var_name = my_name)]

results in the desired output

    class  class_mean
   <fctr>       <num>
1:      2 -0.07004949
2:      1 -0.10250014
3:      3 -0.09003567