I want to run Linear Regression along with K fold cross validation using sklearn library on my training data to obtain the best regression model. I then plan to use the predictor with the lowest mean error returned on my test set.
For example the below piece of code gives me an array of 20 results with different neg mean absolute errors, I am interested in finding the predictor which gives me this (least) error and then use that predictor on my test set.
sklearn.model_selection.cross_val_score(LinearRegression(), trainx, trainy, scoring='neg_mean_absolute_error', cv=20)
There is no such thing as "predictor which gives me this (least) error" in
cross_val_score
, all estimators in :are the same.
You may wish to check GridSearchCV that will indeed search through different sets of hyperparams and return the best estimator:
Note the
refit=True
param that ensures the best model is refit on the whole dataset and returned.