Statsmodels: Short way of writing Formula

2.1k views Asked by At

Logistic regression model using statesmodels:

log_reg = st.logit(formula = 'label ~ pregnant + glucose + bp + insulin + bmi + pedigree + age', data=pima).fit()

is there any short way of writing second part of formula (pregnant + glucose + bp + insulin + bmi + pedigree + age)? Here all the columns have to be mentioned explicitly. If there are more than 100 columns, it would be difficult to write and also the statement would be very long.


There are 2 answers

Josef On

There are no specific shortcuts for the formulas.

You can use python string manipulation to build the formula, e.g. based on pandas dataframe column names.

Or you work directly with arrays or dataframes. But even then you need a list of names if you want human readable output for example in summary(). If you only need prediction, then arrays without variable names are useful.

Mór Kapronczay On

If df is a pd.DataFrame, and y is the target variable, this function returns a string of the formula you are looking for.

def formula_from_cols(df, y):
    return y + ' ~ ' + ' + '.join([col for col in df.columns if not col==y])