How can I rename columns when using combine in Julia DataFrame for many functions?

471 views Asked by At

what's wrong with the following sintaxis:

combine(gpd, :SepalWidth .=> [mean, sum] => [:mymean, :mysum] )

given that gdp is groupedDataFrame, and given that I want the columns :mymean and :mysum

1

There are 1 answers

3
Bogumił Kamiński On

You are missing a dot in broadcasting. The following should work:

combine(gpd, :SepalWidth .=> [mean, sum] .=> [:mymean, :mysum])

EDIT

A crucial part of learning how to debug complex expressions in DataFrames.jl mini language is to understand that one can always check how broadcasting will handle the passed expression stand alone.

In this case you have:

julia> :SepalWidth .=> [mean, sum] .=> [:mymean, :mysum]
2-element Vector{Pair{Symbol}}:
 :SepalWidth => (Statistics.mean => :mymean)
 :SepalWidth => (sum => :mysum)

so as you can see the result is a vector of two correct transformation operations.

Now let us have a look at:

julia> [:SepalWidth, :SepalLength] .=> [mean] => [:mymean1, :mymean2]
2-element Vector{Pair{Symbol, Pair{Vector{typeof(mean)}, Vector{Symbol}}}}:
  :SepalWidth => ([Statistics.mean] => [:mymean1, :mymean2])
 :SepalLength => ([Statistics.mean] => [:mymean1, :mymean2])

This is clearly incorrect - as you try to store the result of mean as two columns. Instead if you write e.g.:

julia> [:SepalWidth, :SepalLength] .=> mean .=> [:mymean1, :mymean2]
2-element Vector{Pair{Symbol, Pair{typeof(mean), Symbol}}}:
  :SepalWidth => (Statistics.mean => :mymean1)
 :SepalLength => (Statistics.mean => :mymean2)

all is correct again.

Interestingly, in some cases you can omit a dot in broadcasting (but this is rare). For instance:

julia> [:SepalWidth, :SepalLength] .=> mean .=> identity
2-element Vector{Pair{Symbol, Pair{typeof(mean), typeof(identity)}}}:
  :SepalWidth => (Statistics.mean => identity)
 :SepalLength => (Statistics.mean => identity)

julia> [:SepalWidth, :SepalLength] .=> mean => identity
2-element Vector{Pair{Symbol, Pair{typeof(mean), typeof(identity)}}}:
  :SepalWidth => (Statistics.mean => identity)
 :SepalLength => (Statistics.mean => identity)

give you exactly the same result (in this case the identity part means that you reuse the input column name as output column name). The result with => and .=> in the second part is the same since the second and third of the whole expression have a single element.