Spark saving RDD[(Int, Array[Double])] to text file got strange result

1.6k views Asked by At

I am trying to save the userFeature of a MatrixFactorizationModel to textFile, which according to the doc is a RDD of type [(Int, Array[Double])]. So I just called

model.userFeature.saveAsTextFile("feature")

However, the results I got are something like:

(1,[D@4b7707f1)
(5,[D@513e9aca)
(9,[D@7d09bcab)
(13,[D@31058458)
(17,[D@2a5df2a7)
(21,[D@5372efd7)
(25,[D@59d1c59a)
(29,[D@53ee5e25)
(33,[D@498f5a34)
(37,[D@4f9967eb)
(41,[D@5560afb)
(45,[D@2dc7f659)
(49,[D@b46fcc)
(53,[D@38098dd1)
(57,[D@77090fb5)
(61,[D@64769e18)

What I am expecting is something like:

(1, [1.1, 2.3, 0.4, ...])
(2, [0.1, 0.3, 0.4, ...])
...

So what's wrong?

1

There are 1 answers

0
Justin Pihony On BEST ANSWER

The behavior of saveAsTextFile is to use the toString method. So, for an Array, this is merely the hashcode. You have two options if you stick with saveAsTextFile:

.mapValues(x=>/*TURN ARRAY DATA INTO A STRING*/).saveAsTextFile...

or you can use map to wrap the data in a custom object with a custom toString, or in this case a List and its toString might work

.mapValues(_.toList).saveAsTextFile