Using Pandas 'categorical' dtype with sklearn

Question

Using Pandas 'categorical' dtype with sklearn

4.2k views Asked by toes At 15 June 2015 at 18:02

Is there any support in sklearn to use Panda's Categorical datatype directly in fitting models? From what I've seen sklearn does not support this datatype which is unfortunate because the Categorical datatype both encodes categorical data and contains the mapping scheme of the data. In addition categorical encoding is purely a data handling/processing problem so it seems more natural that it would be handled by Pandas.

Note

I realize there are several methods to encode categorical variables in Pandas and sklearn - that's not what I'm asking about.

Original Q&A

There are 1 answers

**Andreas Mueller** · Accepted Answer · 2015-06-16T17:51:01+00:00

Cross-posting from the issue-tracker:

I think these are at least two separate questions: 1. can / will sklearn support pandas dataframes with categorical features as input 2. can / will sklearn support operating on categorical variables via pandas categorical datatypes.

would be more or less converting all categorical variables into one-hot encoded features, aka dummy columns. That is really easy to do for the user. We could do that "under the hood" in scikit-learn, but it would complicate the code and I don't see a great benefit.
Is basically impossible. Having a categorical datatype would be nice for the trees, but I think pandas has no stable c-level interface, so we can't really tab into that. Even if there was, it would still require a substantial rewrite of the tree code. I don't think it would be helpful for non-tree estimators.

TechQA.

Using Pandas 'categorical' dtype with sklearn

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SCIKIT-LEARN

Related Questions in CATEGORICAL-DATA

Popular Questions

Popular Tags

Trending Questions