This question is specific to dataframe.js
.
Here is the test data I am using
let data = [
{
year : 2020,
v : 0.1,
cnt_1 : 1,
cnt_2 : 20
},
{
year : 2020,
v : 0.1,
cnt_1 : 3,
cnt_2 : 20
},
{
year : 2020,
v : 0.1,
cnt_1 : 5,
cnt_2 : 4
},
{
year : 2020,
v : 0.1,
cnt_1 : 7,
cnt_2 : 20
},
{
year : 2020,
v : 0.2,
cnt_1 : 9,
cnt_2 : 20
},
{
year : 2020,
v : 0.2,
cnt_1 : 11,
cnt_2 : 20
},
{
year : 2021,
v : 0.2,
cnt_1 : 13,
cnt_2 : 20
},
{
year : 2020,
v : 0.1,
cnt_1 : 15,
cnt_2 : 20
},
{
year : 2021,
v : 0.1,
cnt_1 : 17,
cnt_2 : 20
}
];
And The result I expected looks like ...
| year | v | cnt_1_sum | cnt_2_sum |
------------------------------------
| 2020 | 0.1 | 31 | 84 |
| 2020 | 0.2 | 20 | 40 |
| 2021 | 0.2 | 13 | 20 |
| 2021 | 0.1 | 17 | 20 |
I could do that with single column like below. But got no idea with multiple columns.(In this case, cnt_1
and cnt_2
)
let df = new DataFrame(data);
let grouped = df.groupBy('year', 'v');
let cnt1_sum = grouped.aggregate(grpObj => grpObj.stat.sum('cnt_1')).rename('aggregation', 'cnt_1_sum');
cnt1_sum.show();
// and shows below
| year | v | cnt_1_sum |
------------------------------------
| 2020 | 0.1 | 31 |
| 2020 | 0.2 | 20 |
| 2021 | 0.2 | 13 |
| 2021 | 0.1 | 17 |
The only way I know is join 2 dataframes with year
and v
. But it is so ... inefficient when there are multiple grouped columns.(if got 8 columns then should I have to join 8 dataframes?)
So here is the question. It there anyway to
- apply
stat
function to multiple columns ? - add a column with data ? (
withColumn
API is not working with plain array)
I was able to do it by changing slightly Igor's code: