Spark pairRDD not working

774 views Asked by At

value subtractByKey is not a member of org.apache.spark.rdd.RDD[(String, LabeledPoint)]

value join is not a member of org.apache.spark.rdd.RDD[(String, LabeledPoint)]

How come this is happening? org.apache.spark.rdd.RDD[(String, LabeledPoint)] is pair-value RDD and I already imported import org.apache.spark.rdd._

1

There are 1 answers

1
David Griffin On

In the spark-shell, this works exactly as expected, without having to import anything:

scala> case class LabeledPoint(x: Int, y: Int, label: String)
defined class LabeledPoint

scala> val rdd1 = sc.parallelize(List("this","is","a","test")).map(label => (label, LabeledPoint(0,0,label)))
rdd1: org.apache.spark.rdd.RDD[(String, LabeledPoint)] = MapPartitionsRDD[1] at map at <console>:23

scala> val rdd2 = sc.parallelize(List("this","is","a","test")).map(label => (label, 1))
rdd2: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:21

scala> rdd1.join(rdd2)
res0: org.apache.spark.rdd.RDD[(String, (LabeledPoint, Int))] = MapPartitionsRDD[6] at join at <console>:28