I have a question about the correct use of agg
in pandas
. The specific problem I am working on is in the field of finance and, more specifically, is to calculate a liquidity measure from the full limit order book.
My data contain the ask side of the order book (which represents how many shares people want to sell at a particular moment and at which price) and I want to calculate the hypothetical price for buying 50 shares at a specific moment in time. Assume for example that the order book for stock X at 9am looks like this:
example_data=pd.DataFrame({'price':[100.023,100.031,100.039,100.109,100.219 ],'avail_shares': [40,1,20,23,15],'midpoint':[99.996 ,99.996 ,99.996 ,99.996,99.996 ]})
where price is the price at which shares are sold, avail_shares the number of shares available at each price and midpoint the average of the best ask and bid price in the order book. To get a liquidity measure that takes into account that a large order can hit multiple price levels at once (i.e. ‘walk the book’) I define the following cost-to-trade (ctt) function:
def ctt_ask(dfrm,level=50):
dfrm['cumshares']=dfrm['avail_shares'].cumsum()
dfrm['indicator']=0
dfrm['indicator'].ix[dfrm.cumshares<level,]=dfrm.cumshares
dfrm['indicator'].ix[(dfrm.cumshares>level) & (dfrm.cumshares.shift(1)<level),]=(level- dfrm.cumshares.shift(1))
liquidity_measure=((dfrm.price-dfrm.midpoint)*dfrm.indicator).sum()
return liquidity_measure
This works just fine (i.e. ctt_ask(example_data)
yields 2.90) for the above example but my real dataset has several stocks and many date times (it has a MultiIndex
). When I use groupby
and agg
to apply this function to every stock-date time combination ( full_book_ask.groupby(level=[0,1]).agg(ctt_ask)
) I get an error: KeyError: 'avail_shares'
. This is strange because I do have a column named avail_shares in my actual dataset. I have also tried the same with the apply
functionality but this raises the error message Exception: cannot handle a non-unique multi-index!
. I can't seem to figure out what I'm doing wrong here. Any input would be much appreciated!