I am doing ETL on securities market data in order to generate a Self Organizing Map. I'd like to transpose the row data:
AAME, 20030101, 1.63, 1.63, 1.63, 1.63, 0 AAON, 20030101, 5.4635, 5.4635, 5.4635, 5.4635, 0 AAPL, 20030101, 7.165, 7.165, 7.165, 7.165, 0 ABAX, 20030101, 3.96, 3.96, 3.96, 3.96, 0 ... ZUMZ, 20131104, 29.55, 29.79, 29.18, 29.46, 218100
into column data:
AAME 1.63, 1.65, ... AAON 5.4635, 5.3
Should I try to use ReduceByKey(extend) or ReduceByKey(append) should I try a BlockMatrix?
https://spark.apache.org/docs/latest/mllib-data-types.html#blockmatrix
E.g. -
val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
// Nothing happens if it is valid.
matA.validate()
// Calculate A^T A.
val ata = matA.transpose.multiply(matA)