I am currently using Scikit-Learn's LogisticRegression to build a model. I have used
from sklearn import preprocessing
scaler=preprocessing.StandardScaler().fit(build)
build_scaled = scaler.transform(build)
to scale all of my input variables prior to training the model. Everything works fine and produces a decent model, but my understanding is the coefficients produced by LogisticRegression.coeff_ are based on the scaled variables. Is there a transformation to those coefficients that can be used to adjust them to produce coefficients that can be applied to the non-scaled data?
I am thinking forward to am implementation of the model in a productionized system, and attempting to determine if all of the variables need to be pre-processed in some way in production for scoring of the model.
Note: the model will likely have to be re-coded within the production environment and the environment is not using python.
You have to divide by the scaling you applied to normalise the feature, but also multiply by the scaling that you applied to the target.
Suppose
each feature variable x_i was scaled (divided) by scale_x_i
the target variable was scaled (divided) by scale_y
then
Here's an example using pandas and sklearn LinearRegression
This shows us our coefficients for a linear regression with no scaling applied.
We now normalise all our variables
We can do the regression again on this normalised data...
...and apply the scaling to get back our original coefficients
When we do this we see that we have recreated our original coefficients.
For some machine learning methods, the target variable y must be normalised as well as the feature variables x. If you've done that, you need to include this "multiply by the scale of y" step as well as "divide by the scale of X_i" to get back the original regression coefficients.
Hope that helps