I am using Spark 1.6 with Java 7
I have a pair RDD:
JavaPairRDD<String, String> filesRDD = sc.wholeTextFiles(args[0]);
I want to convert it into DataFrame
with schema.
It seems that first I have to convert pairRDD to RowRDD.
So how to create RowRdd from PairRDD ?
For Java 7 you need to define a map function
Now you can call this function to get
JavaRDD<Row>
With Java 8 it is simply like
Another way to get Dataframe from JavaPairRDD is