Writing function methods for passing GroupedDataFrame in Julia

161 views Asked by At

I have written a function like the following one:

gini(v::Array{<:Real,1}) = (2 * sum([x*i for (i,x) in enumerate(sort(v))]) / sum(sort(v)) - (length(v)+1))/(length(v))

This function works well when passing a Vector or a DataFrame. For example:

gini(collect(1:1:10))
# 0.3

or

using DataFrames # DataFrames v1.3.2

df = DataFrame(v = collect(1:1:10),
               group = repeat([1, 2], 5))

combine(df, :v => gini)
#1×1 DataFrame
# Row │ v_gini  
#     │ Float64
#─────┼─────────
#   1 │     0.3

However, unlike other functions that take vectors as an argument (e.g. Statistics.mean), it throws a MethodError when passing a GroupedDataFrame.

combine(groupby(df, :group), :v => gini)
#  nested task error: MethodError: no method matching #gini(::SubArray{Int64, 1, Vector{Int64}, Tuple{SubArray{Int64, 1, #Vector{Int64}, Tuple{UnitRange{Int64}}, true}}, false})
 #   Closest candidates are:
  #    gini(::Vector{<:Real})

How can I write functions like the one above that work when passing a GroupedDataFrame?

1

There are 1 answers

0
Bogumił Kamiński On BEST ANSWER

You need to change method signature to:

gini(v::AbstractVector{<:Real})

The point is that combine passes a view of a vector (which does not have Vector type but SubArray). Therefore you need to allow any vectors by your function not just Vector.