I would like to create a Map column which counts the number of occurrences.
For instance:
+---+----+
|  b|   a|
+---+----+
|  1|   b|
|  2|null|
|  1|   a|
|  1|   a|
+---+----+
would result in
+---+--------------------+
|  b|                 res|
+---+--------------------+
|  1|[a -> 2.0, b -> 1.0]|
|  2|                  []|
+---+--------------------+
For the moment, in Spark 2.4.6, I was able to make it using udaf.
While bumping to Spark3 I was wondering if I could get rid of this udaf (I tried using the new method aggregate without success)
Is there an efficient way to do it? (For the efficiency part, I am able to test easily)
 
                        
Here a Spark 3 solution:
gives:
Here the solution using
Aggregator:gives: