How variable alpha changes SGDRegressor behavior for outlier?

312 views Asked by At

I am using SGDRegressor with a constant learning rate and default loss function. I am curious to know how changing the alpha parameter in the function from 0.0001 to 100 will change regressor behavior. Below is the sample code I have:

from sklearn.linear_model import SGDRegressor

out=[(0,2),(21, 13), (-23, -15), (22,14), (23, 14)] 
alpha=[0.0001, 1, 100]
N= len(out)

for i in alpha:
    X= b * np.sin(phi)   #Since for every alpha we want to start with original dataset, I included X and Y in this section
    Y= a * np.cos(phi)
    for num in range(N):
        plt.subplot(3, N, j)
        X=np.append(X,out[num][0]) # Appending outlier to main X
        Y=np.append(Y,out[num][1]) # Appending outlier to main Y
        j=j+1  # Increasing J so we move on to next plot
        model=SGDRegressor(alpha=i, eta0=0.001, learning_rate='constant',random_state=0), 1), Y) # Fitting the model

        plt.title("alpha = "+ str(i) + " | " + "Slope :" + str(round(model.coef_[0], 4))) #Adding title to each plot
        abline(model.coef_[0],model.intercept_)  # Plotting the line using abline function

As shown above I had the main datset of X and Y and in each iteration, I am adding a point as an outlier to the main dataset and train the model and plot regression line (hyperplane). Below you can see the result for different values of alpha:

enter image description here

I am looking at results and am still confused and can't make solid conclusion as how alhpa parameter changes the model? what's the effect of alpha? is it causing overfitting? underfitting?


There are 1 answers

zodiac508 On

From scikit-learn:

alpha : float, default=0.0001
Constant that multiplies the regularization term. The higher the value, the stronger the regularization. Also used to compute the learning rate when set to learning_rate is set to 'optimal'.

As for regularization, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting. If there is noise (not "true" data) in the training data, then the model's estimated coefficients won’t generalize well to the future (test) data. This is where regularization comes in and shrinks or regularizes these learned estimates towards zero.

From Towards Data Science (paraphrased):

A standard least squares model tends to have some variance in it, i.e. this model won’t generalize well for a data set different than its training data. Regularization significantly reduces the variance of the model, without substantial increase in its bias. The tuning parameter alpha controls the impact on bias and variance. As the value of alpha rises, it reduces the value of coefficients, thus reducing the variance.
Till a point, this increase in alpha is beneficial as it is only reducing the variance (hence avoiding overfitting), without losing any important properties in the data. But after certain value, the model starts losing important properties, giving rise to bias in the model and thus underfitting.

In your example, comparing the rows of the third column highlights this effect (slope).