I am doing this as to learn machine learning.

I did comparison of Gradient Boosted Regression, XGBoost, Lasso, Ridge, ElasticNetCV, Support Vector Regression, and LightGBM.

After doing the calculation of mean squared error for each algorithm, I wanted to plot the training error to see their performance.

However, the plot came out differently than the number I received.

For example, in the image below, for Lasso, the mean squared error is at about 0.006++.

But, when I calculate using code below, the result is 0.0082.

Algorithm

lsr = Lasso(alpha=0.00047)

Mean Squared Error calculation

-cross_val_score(lsr, train_dummies, y, scoring="neg_mean_squared_error").mean()

Here are the rest of other algorithm that I ran:

svr = make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003))
gbr = GradientBoostingRegressor(max_depth=4, n_estimators=150)
xgbr = XGBRegressor(max_depth=5, n_estimators=400)
rr = Ridge(alpha=13)
svr = make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003))
lgbm = LGBMRegressor(objective='regression', 
                                   num_leaves=4,
                                   learning_rate=0.01, 
                                   n_estimators=5000,
                                   max_bin=200, 
                                   bagging_fraction=0.75,
                                   bagging_freq=5, 
                                   bagging_seed=7,
                                   feature_fraction=0.2,
                                   feature_fraction_seed=7,
                                   verbose=-1,
                                   )

These are for elasticnet

e_alphas = [0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007] 
e_l1ratio = [0.8, 0.85, 0.9, 0.95, 0.99, 1]
en = make_pipeline(RobustScaler(), ElasticNetCV(max_iter=1e7, alphas=e_alphas, cv=5, l1_ratio=e_l1ratio))

Here is the code for the learning curve

pl_mo = {'GBR': GradientBoostingRegressor(max_depth=4, n_estimators=150),
     'XGB': XGBRegressor(max_depth=5, n_estimators=400),
     'Lasso': Lasso(alpha=0.00047),
     'Ridge': Ridge(alpha=13),
     'ENet': make_pipeline(RobustScaler(), ElasticNetCV(max_iter=1e7, alphas=e_alphas, l1_ratio=e_l1ratio)),
     'SVR': make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003)),
     'LGBM': LGBMRegressor(objective='regression', 
                                   num_leaves=4,
                                   learning_rate=0.01, 
                                   n_estimators=5000,
                                   max_bin=200, 
                                   bagging_fraction=0.75,
                                   bagging_freq=5, 
                                   bagging_seed=7,
                                   feature_fraction=0.2,
                                   feature_fraction_seed=7,
                                   verbose=-1,
                                   )
    }

plt.figure(figsize=(10,7))

for k,v in pl_mo.items():
    (train_sizes,
     train_scores,
     test_scores) = learning_curve(v, 
                                   train_dummies,
                                   y,
                                   cv=5,
                                  scoring='neg_mean_squared_error')
    train_scores = -train_scores
    train_mean = np.mean(train_scores, axis=1)
    plt.plot(train_sizes, train_mean, label=k)

plt.title("Training Error")
plt.xlabel("Training Set Size"), plt.ylabel("Mean Squared Error")
plt.legend()
plt.show()

Here is the plot result image.

Learning curve

If anyone could point me in the right direction, I would be eternally grateful.

0 Answers