Calculating error of linear regression coefficient, given errors of y

71 views Asked by At

I am processing my lab measurements related to measuring the speed of sound. To put my goal simply, I have a series of measurements y(x) as follows:

x       y
0       0
1     212
2     426
3     640
4     858
5    1074
6    1290
7    1506
8    1722
9    1939

And also I know the measurements of y may be off by 2. So, for example, with x = 1, y could be anywhere from 210 to 214. I wanna know how much impact this error has on the coefficients of linear regression.

I was using sklearn LinearRegression and with fit_intercept=False parameter the task wasn't so hard. I just needed to calculate the coefficient for series y - 2 and y + 2 and get the difference. But then I have to do a similar task without fit_intercept=False (so y is not 0 when x is 0).

So I am wondering are there any officially implemented ways to achieve my goal? Not necessarily in sklearn.

2

There are 2 answers

2
lastchance On BEST ANSWER

The slope coefficient m in y = mx + c is found below. (I suspect that you only need the slope to get the speed of sound from your data.)

(Case 1) If non-zero intercept c is allowed then the slope is:

enter image description here

and the denominator is positive. (It is N times the variance of x).

To get the MAXIMUM slope you want to maximize:

enter image description here

So, take the greatest possible value of y if x is greater than x_mean and the smallest value of y if x is less than x_mean.

To get the MINIMUM slope then minimize the numerator by doing the reverse.

(Case 2) If the intercept c is forced to be zero (the line has to go through the origin) then the slope is:

enter image description here

Since the x values are fixed then maximize the slope by taking the largest possible value of y where x is positive and the smallest possible value when x is negative. Again, do the reverse to get the minimum slope.

1
Muhammed Yunus On

This is an empirical approach which runs a noise impact study with uniform-noise trials on the original data.

enter image description here

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#
# Data
#
data = pd.DataFrame({
    'x': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'y': [0, 212, 426, 640, 858, 1074, 1290, 1506, 1722, 1939]
})

#
# Get parameter estimates from original data + model
#
from sklearn.linear_model import LinearRegression
rng = np.random.default_rng(0)

#Get slope/intercept of the data
ref_model = LinearRegression(fit_intercept=True).fit(data[['x']], data[['y']])
ref_slope = ref_model.coef_[0]
ref_intercept = ref_model.intercept_

#
# Noise the data and record the estimated parameters
#

#Set the noise trial parameters
noise_amplitude = 2
num_noise_trials = 1000

trial_slopes = []
trial_intercepts = []
for trial in range(num_noise_trials):
    y_noised = data.y + rng.uniform(low=-1, high=1, size=len(data)) * noise_amplitude
    
    lr = LinearRegression(fit_intercept=True).fit(data[['x']], y_noised)
    trial_slopes.append(lr.coef_[0])
    trial_intercepts.append(lr.intercept_)

#
# Plot, comparing non-noised vs noised estimates.
# Include some trial stats.
#
f, axs = plt.subplots(nrows=2, ncols=1, figsize=(7, 5), sharex=True, layout='tight')
ax = axs[0]
ax.scatter(range(num_noise_trials), trial_slopes, marker='.', s=10, label='trial estimate')
ax.axhline(y=ref_slope, color='tab:purple', linewidth=2, label='non-noised estimate')
ax.set_ylabel('slope')
ax.set_title(
    f'slope | max={np.max(trial_slopes):.2f}, min={np.min(trial_slopes):.2f}',
    fontsize=10
)

ax = axs[1]
ax.scatter(range(num_noise_trials), trial_intercepts, marker='.', s=10)
ax.axhline(y=ref_intercept, color='tab:purple', linewidth=2)
ax.set_ylabel('intercept')
ax.set_title(
    f'intercept | max={np.max(trial_intercepts):.2f}, min={np.min(trial_intercepts):.2f}',
    fontsize=10
)

ax.set_xlabel('trial number')
f.legend(ncols=2, bbox_to_anchor=(0.8, 0.02))
f.suptitle('Parameter estimation in uniform-noise trials')