Group by then filter logic for dataset in scala using spark 1.x

45 views Asked by user1570345 At 18 October 2024 at 04:18

I have the following logic to be achieved in scala runs on spark 1.x.

Need to come up with a logic in selecting the columns and values from input dataframe to create an output dataframe. consider input dataframe contains clientid, planid, id, sourceid, asofdate, amount. if the combination of clientid, planid, id, sourceid. has only one row, then return the same in the output dataframe . if the combination has more than one entry and asofdate and amount are same, then keep only one row and discard the others and if the asofdate is same and amount is different and sourceid is same, then add the amount and create one row. if the asofdate and amount is different and sourceid is same then take the latest asofdate row and discard other rows. the asofdate format is MM/DD/YYYY.

I tried a logic with udf returning row but when I run to test it throw error that Row not supported for UDF. Can someone please help ?

Original Q&A

TechQA.

Group by then filter logic for dataset in scala using spark 1.x

There are 0 answers

Related Questions in DATAFRAME

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-1.6

Popular Questions

Popular Tags

Trending Questions