Code description: I am trying to calculate various rolling metrics on financial time series data. I am using a looped approach as I would like to simulate data coming in from an API.
My original code was using a simple itertuples loop which passed values to NumPy arrays for the rolling calculations. However, I would like to speed up the calculations with Numba. As such, I need to iterate through the data using NumPy within a function.
I am getting the following error when trying to iterate through the Numpy array.
PyDev console: starting.
Python 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)] on win32
runfile('F:/Python/Directories/Directed Reading/Data Handling 0.1 JIT.py', wdir='F:/Python/Directories/Directed Reading')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm 2019.3.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2019.3.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "F:/Python/Directories/Directed Reading/Data Handling 0.1 JIT.py", line 92, in <module>
Results = RunBacktest(Gran, Period, ZMinMax, upperbound, lowerbound, UpperExit, LowerExit, Data)
File "F:/Python/Directories/Directed Reading/Data Handling 0.1 JIT.py", line 60, in RunBacktest
if bid > 0:
TypeError: '>' not supported between instances of 'numpy.ndarray' and 'int'
The code is as follows:
Data = "F:/Market Data/2020.3.15 FXAUDCAD-TICK-NoSession.h5"
df = pd.read_hdf(Data)
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df = df.drop(columns=['DateTime'])
df = df.resample(Gran).mean()
t0 = time.time()
Array = df['Bid'].to_numpy()
def RunBacktest(Gran, Period, ZMinMax, upperbound, lowerbound, UpperExit, LowerExit, Array):
# Arrays for storing "data feed"
live_dtime_arr = np.array([])
live_arr = np.array([])
live_ma = np.array([])
live_s_dev = np.array([])
live_z_score = np.array([])
live_buy_sig = np.array([])
live_sell_sig = np.array([])
count = 0
sell_count = 0
buy_count = 0
# Loop through rows
for i in np.nditer(Array):
count += 1
bid = i #< this line is throwing the error
#I did this to filter Nan data points
if bid > 0:
if count > Period:
ma = live_arr[-Period:].mean()
s_dev = live_arr[-Period:].std()
z_score = (bid - ma) / s_dev
else:
ma = np.nan
s_dev = np.nan
z_score = np.nan
if z_score > upperbound:
sell_sig = bid
sell_count += 1
elif z_score < lowerbound:
buy_sig = bid
buy_count += 1
else:
signal_filter = 0
sell_sig = np.nan
buy_sig = np.nan
live_arr = np.append(live_arr, [bid], axis=0)
live_ma = np.append(live_ma, [bid], axis=0)
live_s_dev = np.append(live_s_dev, [s_dev], axis=0)
live_z_score = np.append(live_z_score, [z_score], axis=0)
live_buy_sig = np.append(live_buy_sig, [buy_sig], axis=0)
live_sell_sig = np.append(live_sell_sig, [sell_sig], axis=0)
return live_arr
Results = RunBacktest(Gran, Period, ZMinMax, upperbound, lowerbound, UpperExit, LowerExit, Data)
print(Results)
Sample Data: (From df)
Note: There are some nan values in the 'Bid' column of the Pandas data frame
DateTime Bid
2006-01-03 00:01:07.588 0.85208
2006-01-03 00:01:08.654 0.85213
2006-01-03 00:01:08.859 0.85212
2006-01-03 00:01:11.472 0.85215
2006-01-03 00:01:12.002 0.85218
... ...
2020-03-15 23:59:57.150 0.85178
2020-03-15 23:59:57.300 0.85179
2020-03-15 23:59:58.233 0.85179
2020-03-15 23:59:58.366 0.85178
2020-03-15 23:59:58.595 0.85179
When I run the loop outside of the function, the printed values appear as expected.
I am new to programming and would really appreciate some advice/help. Thanks!
I'm using Python 3.7.9 and NumPy 1.19.1