How to collect Spark JavaPairRDD data as list

1.6k views Asked by At

I am working on an Apache Spark 2.2.0 task in java and I currently perform a mapToPair() function over my JavaRDD<String> and I get a result of JavaPairRDD<Integer, Table>. Consider Table as any Object type.

What I am trying to do now, is to collect all of the data into a final list that would be returned to the driver program. I don't want to perform any transformation, aggregation or calculation over the data, that's why I thought of using the collect() function.

What I have so far is the following:

JavaPairRDD<Integer, Table> pairs = gData.mapToPair(...);
JavaRDD<Tuple2<Integer, Table>> t = JavaRDD.fromRDD(pairs.rdd(), null);
List<Tuple2<Integer, Table>> el = t.collect();

But for some reason that I can't understand it produces the following error...

Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)

I might be totally on the wrong direction, but can you suggest a way to collect these Tuple2 data and possible iterate over them?

Thank you.


Update No matter what my code does, even if I try to run the simple word count example, then the error of Format.getSplits(FileInputFormat.java:312) still comes up! Any help?

0

There are 0 answers