I have a trading Python Pandas DataFrame which includes the "close" and "volume". I want to calculate the On-Balance Volume (OBV). I've got it working over the entire dataset but I want it to be calculated on a rolling series of 10.
The current function looks as follows...
def calculateOnBalanceVolume(df):
df['obv'] = 0
index = 1
while index <= len(df) - 1:
if(df.iloc[index]['close'] > df.iloc[index-1]['close']):
df.at[index, 'obv'] += df.at[index-1, 'obv'] + df.at[index, 'volume']
if(df.iloc[index]['close'] < df.iloc[index-1]['close']):
df.at[index, 'obv'] += df.at[index-1, 'obv'] - df.at[index, 'volume']
index = index + 1
return df
This creates the "obv" column and works out the OBV over the 300 entries.
Ideally I would like to do something like this...
data['obv10'] = data.volume.rolling(10, min_periods=1).apply(calculateOnBalanceVolume)
This looks like it has potential to work but the problem is the "apply" only passes in the "volume" column so you can't work out the change in closing price.
I also tried this...
data['obv10'] = data[['close','volume']].rolling(10, min_periods=1).apply(calculateOnBalanceVolume)
Which sort of works but it tries to update the "close" and "volume" columns instead of adding the new "obv10" column.
What is the best way of doing this or do you just have to iterate over the data in batches of 10?
I found a more efficient way of doing the code above from this link: Calculating stocks's On Balance Volume (OBV) in python
import numpy as np
def calculateOnBalanceVolume(df):
df['obv'] = np.where(df['close'] > df['close'].shift(1), df['volume'],
np.where(df['close'] < df['close'].shift(1), -df['volume'], 0)).cumsum()
return df
The problem is this still does the entire data set. This looks pretty good but how can I cycle through it in batches of 10 at a time without looping or iterating through the entire data set?
*** UPDATE ***
I've got slightly closer to getting this working. I have managed to calculate the OBV in groups of 10.
for gid,df in data.groupby(np.arange(len(data)) // 10):
df['obv'] = np.where(df['close'] > df['close'].shift(1), df['volume'],
np.where(df['close'] < df['close'].shift(1), -df['volume'], 0)).cumsum()
I want this to be calculated rolling not in groups. Any idea how to do this using Pandas in an efficient way?
*** UPDATE ***
It turns out that OBV is supposed to be calculated over the entire data set. I've settled on the following code which looks correct now.
# calculate on-balance volume (obv)
self.df['obv'] = np.where(self.df['close'] > self.df['close'].shift(1), self.df['volume'],
np.where(self.df['close'] < self.df['close'].shift(1), -self.df['volume'], self.df.iloc[0]['volume'])).cumsum()