I would like to create a Map
column which counts the number of occurrences.
For instance:
+---+----+
| b| a|
+---+----+
| 1| b|
| 2|null|
| 1| a|
| 1| a|
+---+----+
would result in
+---+--------------------+
| b| res|
+---+--------------------+
| 1|[a -> 2.0, b -> 1.0]|
| 2| []|
+---+--------------------+
For the moment, in Spark 2.4.6, I was able to make it using udaf.
While bumping to Spark3 I was wondering if I could get rid of this udaf (I tried using the new method aggregate
without success)
Is there an efficient way to do it? (For the efficiency part, I am able to test easily)
Here a Spark 3 solution:
gives:
Here the solution using
Aggregator
:gives: