Assigning (or tieing in) function results back to original data in pandas

Question

Assigning (or tieing in) function results back to original data in pandas

76 views Asked by John At 10 June 2015 at 20:25

I am struggling with extracting the regression coefficients once I complete the function call np.polyfit (actual code below). I am able to get a display of each coefficient but am unsure how to actually extract them for future use with the original data.

df=pd.read_csv('2_skews.csv')

Here is a head() of the data

      date      expiry   symbol   strike vol
0  6/10/2015  1/19/2016    IBM      50  42.0
1  6/10/2015  1/19/2016    IBM      55  41.5
2  6/10/2015  1/19/2016    IBM      60  40.0
3  6/10/2015  1/19/2016    IBM      65  38.0
4  6/10/2015  1/19/2016    IBM      70  36.0

There are many symbols with many strikes across many days and many expiry dates as well

I have grouped the data by date, symbol and expiry and then call the regression function with this:

df_reg=df.groupby(['date','symbol','expiry']).apply(regress)

I have this function that seems to work well (gives proper coefficients), i just don't seem to be able to access them and tie them to the original data.

def regress(df):
    y=df['vol']
    x=df['strike']
    z=P.polyfit(x,y,4)
return (z)

I am calling polyfit like this:

from numpy.polynomial import polynomial as P

The final results:

df_reg


date       symbol  expiry   
5/19/2015  GS      1/19/2016    [-112.064833151, 6.76871521993, -0.11147562136...
                   3/21/2016    [-131.2914493, 7.16441276062, -0.1145534833, 0...
           IBM     1/19/2016    [211.458028147, -5.01236287512, 0.044819313514...
                   3/21/2016    [-34.1027973807, 3.16990194634, -0.05676206572...
6/10/2015  GS      1/19/2016    [50.3916788503, 0.795484227762, -0.02701849495...
                   3/21/2016    [31.6090441114, 0.851878910113, -0.01972772270...
           IBM     1/19/2016    [-13.6159660078, 3.23002791603, -0.06015739505...
                   3/21/2016    [-51.6709051223, 4.80288173687, -0.08600312989...
dtype: object

the top results has the functional form of :

y = -0.000002x4 + 0.000735x3 - 0.111476x2 + 6.768715x - 112.064833

I have tried to take the constructive criticism of previous individuals and make my question as clear as possible, please let me know if i still need to work on this :-)

John

Original Q&A

There are 1 answers

**JoeCondron** · Accepted Answer · 2015-06-10T20:56:34+00:00

Changing the output of regress to a Series rather than a numpy array will give you a data frame when you groupby. The index of the series will be the column names:

In [37]:

df = pd.DataFrame(
[[  '6/10/2015',  '1/19/2016',    'IBM',      50,  42.0],
[ '6/10/2015',  '1/19/2016',    'IBM',      55,  41.5],
[  '6/10/2015',  '1/19/2016',    'IBM',      60,  40.0],
[  '6/10/2015',  '1/19/2016',    'IBM',      65,  38.0],
[  '6/10/2015',  '1/19/2016',    'IBM',      70,  36.0]],
columns=['date', 'expiry', 'symbol', 'strike', 'vol'])

def regress(df):
    y=df['vol']
    x=df['strike']
    z=np.polyfit(x,y,4)
    return pd.Series(z, name='order', index=range(5)[::-1])

group_cols = ['date', 'expiry', 'symbol']
coeffs = df.groupby(group_cols).apply(regress)
coeffs


Out[40]:
                         order         4      3          2         1    0
date           expiry   symbol                  
6/10/2015   1/19/2016   IBM -5.388312e-18   0.000667    -0.13   8.033333   -118

To get the columns containing the coefficients for each combination of date, expiry and symbol you can then merge df and coeffs on these columns:

In [25]: df.merge(coeffs.reset_index(), on=group_cols)
Out[25]:
date    expiry     symbol   strike    vol    4              3               2          1       0
0   6/10/2015   1/19/2016   IBM 50  42.0    -6.644454e-18   0.000667    -0.13   8.033333    -118
1   6/10/2015   1/19/2016   IBM 55  41.5    -6.644454e-18   0.000667    -0.13   8.033333    -118
2   6/10/2015   1/19/2016   IBM 60  40.0    -6.644454e-18   0.000667    -0.13   8.033333    -118
3   6/10/2015   1/19/2016   IBM 65  38.0    -6.644454e-18   0.000667    -0.13   8.033333    -118
4   6/10/2015   1/19/2016   IBM 70  36.0    -6.644454e-18   0.000667    -0.13   8.033333    -118

You can then do something like

df = df.merge(coeffs.reset_index(), on=group_cols)
strike_powers = pd.DataFrame(dict((i, df.strike**i) for i in range(5))
df['modelled_vol'] = (strike_powers * df[range(5)]).sum(axis=1)

TechQA.

Assigning (or tieing in) function results back to original data in pandas

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in RETURN

Related Questions in FUNCTION

Popular Questions

Popular Tags

Trending Questions