How to make trend line go through the origin while plotting its R2 value - python

2.8k views Asked by At

I am working with a dataframe df which looks like this:

index       var1      var2      var3
0           0.0       0.0       0.0 
10          43940.7   2218.3    6581.7
100         429215.0  16844.3   51682.7

I wanted to plot each variable, plot their trend line forced to the origin, calculate and plot the R2 value.

I kind of found what I wanted in this post however the trend line doesn't go through the origin and I can't find a way to make it work.

I tried to manually modify the values of the first point of the trend line but the result doesn't seem good.

for var in df.columns[1:]:
    fig, ax = plt.subplots(figsize=(10,7))
    
    x = df.index
    y = df[var]
    
    z = numpy.polyfit(x, y, 1)
    p = numpy.poly1d(z)
    pylab.plot(x,p(x),"r--")
    
    plt.plot(x,y,"+", ms=10, mec="k")
    z = np.polyfit(x, y, 1)
    y_hat = np.poly1d(z)(x)
    y_hat[0] = 0     ###--- Here I tried to replace the first value with 0 but it doesn't seem right to me.

    plt.plot(x, y_hat, "r--", lw=1)
    text = f"$y={z[0]:0.3f}\;x{z[1]:+0.3f}$\n$R^2 = {r2_score(y,y_hat):0.3f}$"
    plt.gca().text(0.05, 0.95, text,transform=plt.gca().transAxes, fontsize=14, verticalalignment='top')
    

Is there any way of doing it? Any help would be greatly appreciated.

2

There are 2 answers

2
Joonas On BEST ANSWER

You could use Scipy and curve_fit for that. Determine your trendline to be y=ax so it goes through the origin.

import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a):
    return a * x

xdata = (0,10,20,30,40)
ydata = (0,12,18,35,38)

popt, pcov = curve_fit(func, xdata, ydata)
plt.scatter(xdata, ydata)
plt.plot(xdata, func(xdata, popt),"r--")
plt.show()

plot

1
ALollz On

You can use statsmodels for a simple linear regression with no intercept

import statsmodels.api as sm

xdata = [0,10,20,30,40]
ydata = [0,12,18,35,38]

res = sm.OLS(ydata, xdata).fit()

The slope and R2 are then stored in attributes:

res.params
#array([1.01666667])

res.rsquared
#0.9884709382637339

And a plethora of other information:

res.summary()

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.988
Model:                            OLS   Adj. R-squared (uncentered):              0.986
Method:                 Least Squares   F-statistic:                              342.9
Date:                Tue, 29 Sep 2020   Prob (F-statistic):                    5.00e-05
Time:                        15:39:50   Log-Likelihood:                         -12.041
No. Observations:                   5   AIC:                                      26.08
Df Residuals:                       4   BIC:                                      25.69
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.0167      0.055     18.519      0.000       0.864       1.169
==============================================================================