Looking for some assistance with a problem with how to to something in scala using spark.
I have:
type DistanceMap = HashMap[(VertexId,String), Int]
this forms part of my data in the form of an RDD of:
org.apache.spark.rdd.RDD[(DistanceMap, String)]
in short my dataset looks like this:
({(101,S)=3},piece_of_data_1)
({(101,S)=3},piece_of_data_2)
({(101,S)=1, (100,9)=2},piece_of_data_3)
What I want to do us flat map my distance map (which I can do) but at the same time for each flatmapped DistanceMap
want to retain the associated string with that. So my resulting data would look like this:
({(101,S)=3},piece_of_data_1))<br>
({(101,S)=3},piece_of_data_2))<br>
({(101,S)=1},piece_of_data_3))<br>
({(109,S)=2},piece_of_data_3))<br>
As mentioned I can flatMap the first part using:
x.flatMap(x=>x._1).collect.foreach(println))
but am stuck on how I can retain the string from the second part of my original data.
This might work for you:
The idea is to convert from
(Seq(a,b,c),Value)
toSeq( (a,Value), (b, Value), (c, Value))
.This is the same in Scala, so here is a standalone simplified Scala example you can paste in Scala REPL:
This results in: