There are other examples on stack that are similar but none of them cover complex (arrayType) data types.

I have a spark dataframe that looks like the following:

With the following schema:

 |-- sequence: struct (nullable = true)
 |    |-- sequence: array (nullable = true)
 |    |    |-- element: long (containsNull = true)

And I want to convert it so that it has the following particular schema type so that it works with the algorithm 'prefix span'.

 |-- sequence: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: long (containsNull = true)

I have tried converting it into a python list of list of lists and moving it back into a spark dataframe and this actually solves the problem, but unfortunately I need to solve this problem using only spark dataframes and rdds.

I have tried this:

from pyspark.sql.types import ArrayType

changedTypedf = df.withColumn("sequence",df["sequence"].cast(ArrayType()))

But I get the following error:

TypeError: __init__() missing 1 required positional argument: 'elementType'

I expect an output where the dataframe is re-created with this new schema and can then be run in prefixspan, but unfortunately my attempt to edit the types after schema declaration have been unsuccesful.

0 Answers