Spark .mapValues setup with multiple values

2.8k views Asked by At

I am trying to setup mapValues so I can do something I have the following RDD created:

res10: Array[(Int, (Double, Double, Double))] = Array((1,(9.1383276E7,1.868480924818E12,4488.0)), (22,(107667.11999999922,2582934.208799982,4488.0)), (2,(2.15141303E8,1.0585204549689E13,4488.0)), (3,(4488.0,4488.0,4488.0)), (44,(0.0,0.0,4488.0)), (18,(1348501.0,4.06652001E8,4488.0)), (9,(4488.0,4488.0,4488.0)))

I am trying to implement the following code but something is off in my syntax:

val dataStatsVals = dataStatsRDD.mapValues(x => {
  x._3, x._1, x._1/x._3, math.pow(((x._2/x._3 - x._1/x._3)), 2)
})

I've been scouring the web trying to find a good .mapValues example that does something like this but can't seem to find it.

Edit: Input is: Sum, Sum of Squares, and Count

Output is: Count, Sum, Average, Variance

2

There are 2 answers

0
Justin Pihony On BEST ANSWER

Without sample expected output, it seems that you want to put all of that in a tuple. If so, you are just missing the outer parentheses:

val dataStatsVals = dataStatsRDD.mapValues(x => {
  (x._3, x._1, x._1/x._3, math.pow((x._2/x._3 - x._1/x._3), 2))
})

This will give you a Tuple4. The results would be:

1 => 3rd element from original tuple
2 => 1st element from original tuple
3 => 1st from original/3rd from original
4 => output of power function
0
Carlos Vilchez On

You only need to process the right part of the tuple, so you can use a pattern matching to make it more readable:

   val dataStatsVals = dataStatsRDD.mapValues{
      case (d1: Double, d2: Double, d3: Double) => (d3, d1, d1/d3, math.pow(((d2/d3 - d1/d3)), 2))
   }