I am struggling with extracting the regression coefficients once I complete the function call np.polyfit (actual code below). I am able to get a display of each coefficient but am unsure how to actually extract them for future use with the original data.
df=pd.read_csv('2_skews.csv')
Here is a head() of the data
date expiry symbol strike vol
0 6/10/2015 1/19/2016 IBM 50 42.0
1 6/10/2015 1/19/2016 IBM 55 41.5
2 6/10/2015 1/19/2016 IBM 60 40.0
3 6/10/2015 1/19/2016 IBM 65 38.0
4 6/10/2015 1/19/2016 IBM 70 36.0
There are many symbols with many strikes across many days and many expiry dates as well
I have grouped the data by date, symbol and expiry and then call the regression function with this:
df_reg=df.groupby(['date','symbol','expiry']).apply(regress)
I have this function that seems to work well (gives proper coefficients), i just don't seem to be able to access them and tie them to the original data.
def regress(df):
y=df['vol']
x=df['strike']
z=P.polyfit(x,y,4)
return (z)
I am calling polyfit like this:
from numpy.polynomial import polynomial as P
The final results:
df_reg
date symbol expiry
5/19/2015 GS 1/19/2016 [-112.064833151, 6.76871521993, -0.11147562136...
3/21/2016 [-131.2914493, 7.16441276062, -0.1145534833, 0...
IBM 1/19/2016 [211.458028147, -5.01236287512, 0.044819313514...
3/21/2016 [-34.1027973807, 3.16990194634, -0.05676206572...
6/10/2015 GS 1/19/2016 [50.3916788503, 0.795484227762, -0.02701849495...
3/21/2016 [31.6090441114, 0.851878910113, -0.01972772270...
IBM 1/19/2016 [-13.6159660078, 3.23002791603, -0.06015739505...
3/21/2016 [-51.6709051223, 4.80288173687, -0.08600312989...
dtype: object
the top results has the functional form of :
y = -0.000002x4 + 0.000735x3 - 0.111476x2 + 6.768715x - 112.064833
I have tried to take the constructive criticism of previous individuals and make my question as clear as possible, please let me know if i still need to work on this :-)
John
Changing the output of
regress
to a Series rather than a numpy array will give you a data frame when you groupby. The index of the series will be the column names:To get the columns containing the coefficients for each combination of date, expiry and symbol you can then merge
df
andcoeffs
on these columns:You can then do something like