I have dataframe as follow:
julia> println(df_roa_kvkt)
15×3 DataFrame
Row │ year sector roa
│ Int64 String Float64
─────┼─────────────────────────────────────────
1 │ 2017 Construction -1.15
2 │ 2017 Services -1.78
3 │ 2017 Agriculture -1.82
4 │ 2018 Construction -1.05
5 │ 2018 Services -1.56
6 │ 2018 Agriculture -1.55
7 │ 2019 Construction -1.32
8 │ 2019 Services -1.53
9 │ 2019 Agriculture -1.38
10 │ 2020 Construction -1.11
11 │ 2020 Services -1.93
12 │ 2020 Agriculture -1.21
13 │ 2021 Construction -0.03
14 │ 2021 Services -0.06
15 │ 2021 Agriculture -0.36
using DataFrames
df_roa_kvkt = DataFrame(year = [2017, 2017, 2017, 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
sector = ["Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture", "Construction", "Services", "Agriculture"],
roa = [-1.15, -1.78, -1.82, -1.05, -1.56, -1.55, -1.32, -1.53, -1.38, -1.11, -1.93, -1.21, -0.03, -0.06, -0.36])
I want to calcalate the percentage change by each year for each sector. I have used the following codes:
function pct_change(input::AbstractVector{<:Number})
[i == 1 ? missing : (input[i]-input[i-1])/input[i-1] for i in eachindex(input)]
end
sort!(df_roa_kvkt, [:year, :sector])
combine(groupby(df_roa_kvkt, [:year, :sector]), :roa => (x -> pct_change(x)) => :proportion)
But the results are all missing in proportion columns. The result I hope to see is that 4 columns with year, sector, roa and proportion columns.
The issue is because you're grouping by sector and year, which means every row is the first one in it's group, which means every row of
proportionismissing.The solution is to group just by sector. This drops the year column, so if we want to keep it, then (I believe the only way is to) use
transform( ortransform!) instead:Output: