How to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection?

Question

How to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection?

862 views Asked by Carrod At 20 December 2016 at 05:53

Firstly, I use spark 1.6.0. I want to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection.

But I can not get the detailed coefficients when calling the function:

lr = LogisticRegression(elasticNetParam=1.0, regParam=0.01,maxIter=100,fitIntercept=False,standardization=False)
model = lr.fit(df_one_hot_train)
print model.coefficients.toArray().astype(float).tolist()

I only get sparse list like:

[0,0,0,0,0,..,-0.0871650387514,..,]

While when I use sklearn.linear_model.LogisticRegression model, I can get the detailed list without zero value in coef_ list like:

[0.03098372361467529,-0.13709075166114365,-0.15069548597557908,-0.017968044053830862]

With the better performance in spark, I could finished my work faster. I just want to use L1 penalty for feature selection.

I think I should use more detailed values of coefficients for my feature selection work just as sklearn does, how can I solve my problem?

Original Q&A

There are 1 answers

**Burt** · Answer 1 · 2017-11-24T14:13:03+00:00

Below is a working code snip in Spark 2.1.

The key to extract values is :

stages(4).asInstanceOf[LinearRegressionModel]

Spark 1.6 may have something similar.

val holIndIndexer = new StringIndexer().setInputCol("holInd").setOutputCol("holIndIndexer")

val holIndEncoder = new OneHotEncoder().setInputCol("holIndIndexer").setOutputCol("holIndVec")

val time_intervaLEncoder = new OneHotEncoder().setInputCol("time_interval").setOutputCol("time_intervaLVec")

val assemblerL1 = (new VectorAssembler()
           .setInputCols(Array("time_intervaLVec", "holIndVec", "length")).setOutputCol("features") )

val lrL1 = new LinearRegression().setFeaturesCol("features").setLabelCol("travel_time")

val pipelineL1 = new Pipeline().setStages(Array(holIndIndexer,holIndEncoder,time_intervaLEncoder,assemblerL1, lrL1))

val modelL1 = pipelineL1.fit(dfTimeMlFull)

val l1Coeff =modelL1.stages(4).asInstanceOf[LinearRegressionModel].coefficients

println(l1Coeff)

TechQA.

How to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-ML

Popular Questions

Popular Tags

Trending Questions