In PySpark ML, how can I interpret the SparseVector returned by a pyspark.ml.classification.RandomForestClassificationModel.featureImportances?

636 views Asked by rjurney At 22 December 2016 at 00:04

I have created and am debugging a PySpark ML RandomForestClassificationModel which was of course created by calling pyspark.ml.classification.RandomForestClassifier.fit(). I want to interpret the feature vectors returned by the RandomForestClassificationModel.featureImportances property. They are a SparseVector.

As you can see in the notebook below, I had to transform my features in several stages to get them into the final Features_vec that fed the algorithm. What I want is a list of features keyed by the feature type and column. How can I use the SparseVector of features to get to a list of feature importances along with feature names, or some other format that is interpretable?

The code is in a Jupyter Notebook here. Skip to the end.

This shouldn't be specific to PySpark, so if you know a Scala solution, please chime in.

Original Q&A

TechQA.

In PySpark ML, how can I interpret the SparseVector returned by a pyspark.ml.classification.RandomForestClassificationModel.featureImportances?

There are 0 answers

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in MACHINE-LEARNING

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-ML

Popular Questions

Popular Tags

Trending Questions