Why does GridSearchCV give different optimums on repeated runs?

1.2k views Asked by At

I am performing parameter selection using GridSearchCv (sklearn package in python) where the model is an Elastic Net with a Logistic loss (i.e a logistic regression with L1- and L2- norm regularization penalties). I am using SGDClassifier to implement this model. There are two parameters I am interested in searching the optimal values for: alpha (the constant that multiplies the regularization term) and l1_ratio (the Elastic Net mixing parameter). My data set has ~300,000 rows. I initialize the model as follows:
sgd_ela = SGDClassifier(alpha=0.00001, fit_intercept=True, l1_ratio=0.1,loss='log', penalty='elasticnet')
and the searching fxn. as follows:
GridSearchCV(estimator=sgd_ela, cv=8, param_grid=tune_para),
with tuning parameters:
tune_para = [{'l1_ratio': np.linspace(0.1,1,10).tolist(),'alpha':[0.00001, 0.0001, 0.001, 0.01, 0.1, 1]}].

I get the best_params (of alpha and l1_ratio) upon running the code. However, in repeated runs, I do not get the same set of best parameters. I am interested to know why is this the case, and if possible, how can I overcome it?

1

There are 1 answers

3
simon On

Try setting the random seed if you want to get the same result each time.