I was going through the JohnSnowLabs SpellChecker here.
I found the Norvig's algorithm implementation there, and the example section has just the following two lines:
import com.johnsnowlabs.nlp.annotator.NorvigSweetingModel
NorvigSweetingModel.pretrained()
How can I apply this pretrained model on my dataframe (df)below for spell correcting the "names" column?
+----------------+---+------------+
| names|age| color|
+----------------+---+------------+
| [abc, cde]| 19| red, abc|
|[eefg, efa, efb]|192|efg, efz efz|
+----------------+---+------------+
I have tried to do it as follows:
val schk = NorvigSweetingModel.pretrained().setInputCols("names").setOutputCol("Corrected")
val cdf = schk.transform(df)
But the above code gave me the following error:
java.lang.IllegalArgumentException: requirement failed: Wrong or missing inputCols annotators in SPELL_a1f11bacb851. Received inputCols: names. Make sure such columns have following annotator types: token
at scala.Predef$.require(Predef.scala:224)
at com.johnsnowlabs.nlp.AnnotatorModel.transform(AnnotatorModel.scala:51)
... 49 elided
spark-nlpare designed to be used in its own specific pipelines and input columns for different transformers have to include special metadata.The exception already tells you that input to the
NorvigSweetingModelshould be tokenized:If I am not mistaken, at minimum you'll have assemble documents and tokenized here.
A
Pipelinelike this, can be applied on your data with small adjustment - input data has to bestringnotarray<string>*:If you want an output that can be exported you should extend your
PipelinewithFinisher.* According to the docs
DocumentAssemblerbut it doesn't look like it works in practice in 1.7.3: