I want to convert the following shown pipeline RDD into a parallel RDD of the form mention below
Here's the schema:
schemaString = "S"
fields = [StructField(schemaString + str(i), FloatType(), True) for i in range(164)]
schema = StructType(fields)
so that it can be finally convert it into a data frame with 164 columns.
I am finding difficulty in clubbing every 164 numbers together.
dec_RDD.take(4)
[(120,), (-119,), (-125,), (-119,)]
new_dec_RDD should be like this
[(120,-119,-125,-119,..........164 row elements),
( 164 row elements ),]