Spark grouping and then sorting (Java code)

Question

Spark grouping and then sorting (Java code)

784 views Asked by Magno C At 18 September 2017 at 01:15

I have a JavaPairRDD and need to group by the key and then sort it using a value inside the object MyObject.

Lets say MyObject is:

class MyObject {
    Integer order;
    String name;
}

Sample data:

1, {order:1, name:'Joseph'}
1, {order:2, name:'Tom'}
1, {order:3, name:'Luke'}
2, {order:1, name:'Alfred'}
2, {order:3, name:'Ana'}
2, {order:2, name:'Jessica'}
3, {order:3, name:'Will'}
3, {order:2, name:'Mariah'}
3, {order:1, name:'Monika'}

Expected result:

Partition 1:

1, {order:1, name:'Joseph'}
1, {order:2, name:'Tom'}
1, {order:3, name:'Luke'}

Partition 2

2, {order:1, name:'Alfred'}
2, {order:2, name:'Jessica'}
2, {order:3, name:'Ana'}

Partition 3:

3, {order:1, name:'Monika'}
3, {order:2, name:'Mariah'}
3, {order:3, name:'Will'}

I'm using the key to partition the RDD and then using MyObject.order to sort the data inside the partition.

My goal is to get only the k-first elements in each sorted partition and then reduce them to a value calculated by other MyObject attribute (AKA "the first N best of the group").

How can I do this?

Original Q&A

There are 1 answers

**T. Gawęda** · Accepted Answer · 2017-09-18T11:08:39+00:00

You can use mapPartitions:

JavaPairRDD<Long, MyObject> sortedRDD = rdd.groupBy(/* the first number */)
    .mapPartitionsToPair(x -> {
        List<Tuple2<Long, MyObject>> values = toArrayList(x);
        Collections.sort(values, (x, y) -> x._2.order - y._2.order);

        return values.iterator();
     }, true);

Two highlights:

toArrayList takes an Iterator and returns ArrayList. You must implement it by yourself
important is to have true as the second argument of mapPartitionsToPair, because it will preserve partitioning

TechQA.

Spark grouping and then sorting (Java code)

There are 1 answers

Related Questions in JAVA

Related Questions in APACHE-SPARK

Related Questions in JAVA-PAIR-RDD

Popular Questions

Popular Tags

Trending Questions