Apache Spark RDD Transpose

204 views Asked by At

I am doing ETL on securities market data in order to generate a Self Organizing Map. I'd like to transpose the row data:

AAME, 20030101, 1.63, 1.63, 1.63, 1.63, 0 AAON, 20030101, 5.4635, 5.4635, 5.4635, 5.4635, 0 AAPL, 20030101, 7.165, 7.165, 7.165, 7.165, 0 ABAX, 20030101, 3.96, 3.96, 3.96, 3.96, 0 ... ZUMZ, 20131104, 29.55, 29.79, 29.18, 29.46, 218100

into column data:

AAME 1.63, 1.65, ... AAON 5.4635, 5.3

Should I try to use ReduceByKey(extend) or ReduceByKey(append) should I try a BlockMatrix?

https://spark.apache.org/docs/latest/mllib-data-types.html#blockmatrix

E.g. -

val matA: BlockMatrix = coordMat.toBlockMatrix().cache()

// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
// Nothing happens if it is valid.
matA.validate()

// Calculate A^T A.
val ata = matA.transpose.multiply(matA)
0

There are 0 answers