I have a JavaPairRDD and need to group by the key and then sort it using a value inside the object MyObject.
Lets say MyObject is:
class MyObject {
Integer order;
String name;
}
Sample data:
1, {order:1, name:'Joseph'}
1, {order:2, name:'Tom'}
1, {order:3, name:'Luke'}
2, {order:1, name:'Alfred'}
2, {order:3, name:'Ana'}
2, {order:2, name:'Jessica'}
3, {order:3, name:'Will'}
3, {order:2, name:'Mariah'}
3, {order:1, name:'Monika'}
Expected result:
Partition 1:
1, {order:1, name:'Joseph'}
1, {order:2, name:'Tom'}
1, {order:3, name:'Luke'}
Partition 2
2, {order:1, name:'Alfred'}
2, {order:2, name:'Jessica'}
2, {order:3, name:'Ana'}
Partition 3:
3, {order:1, name:'Monika'}
3, {order:2, name:'Mariah'}
3, {order:3, name:'Will'}
I'm using the key to partition the RDD and then using MyObject.order to sort the data inside the partition.
My goal is to get only the k-first elements in each sorted partition and then reduce them to a value calculated by other MyObject attribute (AKA "the first N best of the group").
How can I do this?
You can use
mapPartitions
:Two highlights: