Is there any support in sklearn to use Panda's Categorical datatype directly in fitting models? From what I've seen sklearn does not support this datatype which is unfortunate because the Categorical datatype both encodes categorical data and contains the mapping scheme of the data. In addition categorical encoding is purely a data handling/processing problem so it seems more natural that it would be handled by Pandas.
Note
I realize there are several methods to encode categorical variables in Pandas and sklearn - that's not what I'm asking about.
Cross-posting from the issue-tracker:
I think these are at least two separate questions: 1. can / will sklearn support pandas dataframes with categorical features as input 2. can / will sklearn support operating on categorical variables via pandas categorical datatypes.
would be more or less converting all categorical variables into one-hot encoded features, aka dummy columns. That is really easy to do for the user. We could do that "under the hood" in scikit-learn, but it would complicate the code and I don't see a great benefit.
Is basically impossible. Having a categorical datatype would be nice for the trees, but I think pandas has no stable c-level interface, so we can't really tab into that. Even if there was, it would still require a substantial rewrite of the tree code. I don't think it would be helpful for non-tree estimators.