There are other examples on stack that are similar but none of them cover complex (arrayType) data types.

I have a spark dataframe that looks like the following:

https://www.dropbox.com/s/qpdokird4rqe5ci/Screen%20Shot%202019-05-06%20at%201.34.29%20pm.jpeg

With the following schema:

root
 |-- sequence: struct (nullable = true)
 |    |-- sequence: array (nullable = true)
 |    |    |-- element: long (containsNull = true)

And I want to convert it so that it has the following particular schema type so that it works with the algorithm 'prefix span'.

root
 |-- sequence: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: long (containsNull = true)

I have tried converting it into a python list of list of lists and moving it back into a spark dataframe and this actually solves the problem, but unfortunately I need to solve this problem using only spark dataframes and rdds.

I have tried this:

from pyspark.sql.types import ArrayType

changedTypedf = df.withColumn("sequence",df["sequence"].cast(ArrayType()))

But I get the following error:

TypeError: __init__() missing 1 required positional argument: 'elementType'

I expect an output where the dataframe is re-created with this new schema and can then be run in prefixspan, but unfortunately my attempt to edit the types after schema declaration have been unsuccesful.

0 Answers