Firstly, I use spark 1.6.0. I want to use L1 penalty in pyspark.ml.regression.LinearRegressionModel for features selection.
But I can not get the detailed coefficients when calling the function:
lr = LogisticRegression(elasticNetParam=1.0, regParam=0.01,maxIter=100,fitIntercept=False,standardization=False)
model = lr.fit(df_one_hot_train)
print model.coefficients.toArray().astype(float).tolist()
I only get sparse list like:
[0,0,0,0,0,..,-0.0871650387514,..,]
While when I use sklearn.linear_model.LogisticRegression model, I can get the detailed list without zero value in coef_ list like:
[0.03098372361467529,-0.13709075166114365,-0.15069548597557908,-0.017968044053830862]
With the better performance in spark, I could finished my work faster. I just want to use L1 penalty for feature selection.
I think I should use more detailed values of coefficients for my feature selection work just as sklearn does, how can I solve my problem?
Below is a working code snip in Spark 2.1.
The key to extract values is :
Spark 1.6 may have something similar.