I have tried to use pd.cut to create a categorical variable from a continuous variable. I'd like to use this in a subsequent statsmodel defined regression including this dummy variable. When I create a categorical variable created in this way, I get an error
TypeError: data type not understood.
A test case is included below.
import numpy as np
import pandas as pd
import statsmodels as sm
import statsmodels.formula.api as smf
df = pd.DataFrame(np.random.randn(6,4))
df.columns = ['A', 'B', 'C', 'D']
df['ttt']=pd.cut(df['D'], bins=2)
test = smf.ols('A ~ B + ttt', data=df).fit()
I'm sure I've done something obviously wrong. Any help would be appreciated.
I'm not sure exactly where statsmodels is at in terms of including support for the new
Categorical
type in pandas. For the moment, you may have to convert the categorical back into an object type for it to work (please check that the resulting ols fit is sensible, I don't know the full details of what you're trying to do):