There are other examples on stack that are similar but none of them cover complex (arrayType) data types.
I have a spark dataframe that looks like the following:
With the following schema:
root |-- sequence: struct (nullable = true) | |-- sequence: array (nullable = true) | | |-- element: long (containsNull = true)
And I want to convert it so that it has the following particular schema type so that it works with the algorithm 'prefix span'.
root |-- sequence: array (nullable = true) | |-- element: array (containsNull = true) | | |-- element: long (containsNull = true)
I have tried converting it into a python list of list of lists and moving it back into a spark dataframe and this actually solves the problem, but unfortunately I need to solve this problem using only spark dataframes and rdds.
I have tried this:
from pyspark.sql.types import ArrayType changedTypedf = df.withColumn("sequence",df["sequence"].cast(ArrayType()))
But I get the following error:
TypeError: __init__() missing 1 required positional argument: 'elementType'
I expect an output where the dataframe is re-created with this new schema and can then be run in prefixspan, but unfortunately my attempt to edit the types after schema declaration have been unsuccesful.