# Learning curve plots differently than equation

I am doing this as to learn machine learning.

I did comparison of Gradient Boosted Regression, XGBoost, Lasso, Ridge, ElasticNetCV, Support Vector Regression, and LightGBM.

After doing the calculation of mean squared error for each algorithm, I wanted to plot the training error to see their performance.

However, the plot came out differently than the number I received.

For example, in the image below, for Lasso, the mean squared error is at about 0.006++.

But, when I calculate using code below, the result is 0.0082.

Algorithm

``````lsr = Lasso(alpha=0.00047)
``````

Mean Squared Error calculation

``````-cross_val_score(lsr, train_dummies, y, scoring="neg_mean_squared_error").mean()
``````

Here are the rest of other algorithm that I ran:

``````svr = make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003))
xgbr = XGBRegressor(max_depth=5, n_estimators=400)
rr = Ridge(alpha=13)
svr = make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003))
lgbm = LGBMRegressor(objective='regression',
num_leaves=4,
learning_rate=0.01,
n_estimators=5000,
max_bin=200,
bagging_fraction=0.75,
bagging_freq=5,
bagging_seed=7,
feature_fraction=0.2,
feature_fraction_seed=7,
verbose=-1,
)
``````

These are for elasticnet

``````e_alphas = [0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007]
e_l1ratio = [0.8, 0.85, 0.9, 0.95, 0.99, 1]
en = make_pipeline(RobustScaler(), ElasticNetCV(max_iter=1e7, alphas=e_alphas, cv=5, l1_ratio=e_l1ratio))
``````

Here is the code for the learning curve

``````pl_mo = {'GBR': GradientBoostingRegressor(max_depth=4, n_estimators=150),
'XGB': XGBRegressor(max_depth=5, n_estimators=400),
'Lasso': Lasso(alpha=0.00047),
'Ridge': Ridge(alpha=13),
'ENet': make_pipeline(RobustScaler(), ElasticNetCV(max_iter=1e7, alphas=e_alphas, l1_ratio=e_l1ratio)),
'SVR': make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003)),
'LGBM': LGBMRegressor(objective='regression',
num_leaves=4,
learning_rate=0.01,
n_estimators=5000,
max_bin=200,
bagging_fraction=0.75,
bagging_freq=5,
bagging_seed=7,
feature_fraction=0.2,
feature_fraction_seed=7,
verbose=-1,
)
}

plt.figure(figsize=(10,7))

for k,v in pl_mo.items():
(train_sizes,
train_scores,
test_scores) = learning_curve(v,
train_dummies,
y,
cv=5,
scoring='neg_mean_squared_error')
train_scores = -train_scores
train_mean = np.mean(train_scores, axis=1)
plt.plot(train_sizes, train_mean, label=k)

plt.title("Training Error")
plt.xlabel("Training Set Size"), plt.ylabel("Mean Squared Error")
plt.legend()
plt.show()
``````

Here is the plot result image.

If anyone could point me in the right direction, I would be eternally grateful.