Group by then filter logic for dataset in scala using spark 1.x

34 views Asked by At

I have the following logic to be achieved in scala runs on spark 1.x.

Need to come up with a logic in selecting the columns and values from input dataframe to create an output dataframe. consider input dataframe contains clientid, planid, id, sourceid, asofdate, amount. if the combination of clientid, planid, id, sourceid. has only one row, then return the same in the output dataframe . if the combination has more than one entry and asofdate and amount are same, then keep only one row and discard the others and if the asofdate is same and amount is different and sourceid is same, then add the amount and create one row. if the asofdate and amount is different and sourceid is same then take the latest asofdate row and discard other rows. the asofdate format is MM/DD/YYYY.

I tried a logic with udf returning row but when I run to test it throw error that Row not supported for UDF. Can someone please help ?

0

There are 0 answers