Pyspark use DocumentAssembler on array<string>

Question

Pyspark use DocumentAssembler on array<string>

143 views Asked by Rory At 22 May 2023 at 10:55

I am trying to use DocumentAssembler for array of strings. The documentation says: "The DocumentAssembler can read either a String column or an Array[String])". But when I do a simple example:

data = spark.createDataFrame([[["Spark NLP is an open-source text processing library."]]]).toDF("text")
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
result = documentAssembler.transform(data)

result.select("document").show(truncate=False)

I am getting an error

AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast input from "ARRAY<STRING>" to "STRING".
The type path of the target object is:
- root class: "java.lang.String"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object

Maybe I don't understand something?

Original Q&A

There are 1 answers

**Islam Elbanna** · Accepted Answer · 2023-05-22T12:52:40+00:00

I think you just added an extra [] around the input

This is working:

data = spark.createDataFrame([["Spark NLP is an open-source text processing library."]]).toDF("text")
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
result = documentAssembler.transform(data)

result.select("document").show(truncate=False)

+----------------------------------------------------------------------------------------------+
|document                                                                                      |
+----------------------------------------------------------------------------------------------+
|[{document, 0, 51, Spark NLP is an open-source text processing library., {sentence -> 0}, []}]|
+----------------------------------------------------------------------------------------------+

TechQA.

Pyspark use DocumentAssembler on array<string>

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in NLP

Related Questions in JOHNSNOWLABS-SPARK-NLP

Popular Questions

Trending Questions