Converting pairRDD to dataset in spark using java

237 views Asked by At

How to create Spark dataset from pairRDD using java. Could you please help?

1

There are 1 answers

0
Oli On BEST ANSWER

Basically, to go from a dataset to a pairRDD in Java, you first need to convert the dataset to a RDD using javaRDD() and then to a pairRDD using mapToPair.

Here is an example:

//creating a dataset (of rows)
Dataset<Row> ds = spark
    .range(5)
    .select(col("id").alias("x"),
            col("id").multiply(col("id")).alias("y"));
JavaPairRDD<Long, Long> pairRDD = ds
    .javaRDD() // to RDD in Java
    .mapToPair(row -> new Tuple2<>(row.getLong(0), row.getLong(1)));