I have a churn dataset with a column named "CLTV" which is the client value for the company. I created a custom function :
`def penalty(y_test,y_pred):
penalties = []
for i in range(len(y_pred)):
if y_pred[i]-y_test.iloc[i] == -1:
penalties.append(df.loc[y_test.index, 'CLTV'].iloc[i]/df.loc[:,'CLTV'].median())
else:
penalties.append(y_pred[i]-y_test.iloc[i])
return(sum(penalties))`
The less is penalty, the better is the result so I made a custom scorer like:
`from sklearn.metrics import make_scorer
custom_score=make_scorer(penalty,greater_is_better=False)`
I used first a simple model with a class_weight coz' the data is imbalanced:
`from sklearn.linear_model import LogisticRegression
lr=LogisticRegression(class_weight="balanced",max_iter=500)`
And created a grid:
`from sklearn.model_selection import GridSearchCV
grid={'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]}
grid_lr=GridSearchCV(estimator=lr,param_grid=grid,cv=10,scoring=custom_score,error_score='raise')
grid_lr.fit(X_train,y_train)`
For grid_lr.best_score I had -128.49 (negativ is ok coz' it's a loss function)
But when I did that:
`y_train_lr_grid=grid_lr.predict(X_train)
penalty(y_train,y_train_lr_grid)`
The result was 1260 !! ((1260 is so different from 128))
Someone can explain what I did wrong??? Thank you all