Is it necessary to convert categorical attributes to numerical attributes to use LabeledPoint function in Pyspark?

Question

Is it necessary to convert categorical attributes to numerical attributes to use LabeledPoint function in Pyspark?

99 views Asked by jdatastic17 At 06 January 2017 at 18:50

I am new to Pyspark. I have a dataset that contains categorical features and I want to use regression models from pyspark to predict continuous values. I am stuck in pre-processing of data that is required for using MLlib models.

Original Q&A

There are 1 answers

**user7337271** · Accepted Answer · 2017-01-06T21:06:25+00:00

user7337271 On 06 January 2017 at 21:06 BEST ANSWER

Yes, it is necessary. You have to not only convert to numerical but also encode to make them useful for linear models. Both steps are implemented in pyspark.ml (not mllib) with:

pyspark.ml.feature.StringIndexer - indexing.
pyspark.ml.feature.OneHotEncoder - encoding.

TechQA.

Is it necessary to convert categorical attributes to numerical attributes to use LabeledPoint function in Pyspark?

There are 1 answers

Related Questions in PYSPARK

Related Questions in CATEGORICAL-DATA

Popular Questions

Popular Tags

Trending Questions