Iterate through mixed-Type scala Lists

497 views Asked by At

Using Spark 2.1.1., I have an N-row csv as 'fileInput'

colname datatype    elems   start   end
colA    float       10      0       1
colB    int         10      0       9

I have successfully made an array of sql.rows ...

val df = spark.read.format("com.databricks.spark.csv").option("header", "true").load(fileInput)
val rowCnt:Int = df.count.toInt
val aryToUse  = df.take(rowCnt)
Array[org.apache.spark.sql.Row] = Array([colA,float,10,0,1], [colB,int,10,0,9])

Against those Rows and using my random-value-generator scripts, I have successfully populated an empty ListBuffer[Any] ...

res170: scala.collection.mutable.ListBuffer[Any] = ListBuffer(List(0.24455154, 0.108798146, 0.111522496, 0.44311434, 0.13506883, 0.0655781, 0.8273762, 0.49718297, 0.5322746, 0.8416396), List(1, 9, 3, 4, 2, 3, 8, 7, 4, 6))

Now, I have a mixed-type ListBuffer[Any] with different typed lists. . How do iterate through and zip these? [Any] seems to defy mapping/zipping. I need to take N lists generated by the inputFile's definitions, then save them to a csv file. Final output should be:

ColA, ColB
0.24455154, 1
0.108798146, 9
0.111522496, 3
... etc

The inputFile can then be used to create any number of 'colnames's, of any 'datatype' (I have scripts for that), of each type appearing 1::n times, of any number of rows (defined as 'elems'). My random-generating scripts customize the values per 'start' & 'end', but these columns are not relevant for this question).

2

There are 2 answers

0
Haroun Mohammedi On

I think the RDD.zipWithUniqueId() or RDD.zipWithIndex() methods can perform what you wanna do.

Please refer to official documentation for more information. hope this help you

0
Tzach Zohar On

Given a List[List[Any]], you can "zip" all these lists together using transpose, if you don't mind the result being a list-of-lists instead of a list of Tuples:

val result: Seq[List[Any]] = list.transpose

If you then want to write this into a CSV, you can start by mapping each "row" into a comma-separated String:

val rows: Seq[String] = result.map(_.mkString(","))

(note: I'm ignoring the Apache Spark part, which seems completely irrelevant to this question... the "metadata" is loaded via Spark, but then it's collected into an Array so it becomes irrelevant)