This is my data structure, with Date Time being unique and used as index of a pandas dataframe with 700,000 rows with different dates.
| DateTime | Open | High | Low | Close | indicator |
2018-10-23 12:00:00 61.61 61.86 61.6 61.84 0
2018-10-23 12:05:00 61.82 61.98 61.76 61.98 0
2018-10-23 12:10:00 61.98 62.01 61.9 62.01 0
2018-10-23 12:15:00 62.05 62.15 62.01 62.02 0
2018-10-23 12:20:00 62.04 62.13 62.03 62.07 0
2018-10-23 12:25:00 62.08 62.19 62.05 62.19 1
2018-10-23 12:30:00 62.19 62.19 62.11 62.15 0
2018-10-23 12:35:00 62.13 62.24 62.12 62.22 1
2018-10-23 12:40:00 62.23 62.34 62.22 62.29 0
2018-10-23 12:45:00 62.3 62.37 62.21 62.25 0
I would like to slice the dataframe containing the rows only matching the following criteria- 1. the date of the rows must be the same as the row with indicator value of 1 2. only the rows going forward in time should be sliced
The code takes about 10-12 seconds to execute, is there of improving the time?
My Python code-
# data is a pandas dataframe as above
arr = []
temporarydf = data[data['indicator'] == 1]
for i in range(len(temporaryDF)):
sliceddata = data[(data['DateTime'] > temporaryDF['DateTime'].iloc[i]) &
(data['DateTime'].dt.date ==
temporaryDF['DateTime'].iloc[i].date())]
arr.append(sliceddata)
Thank you.
you are unnecessarily looping as data is a unique class of pandas which has all the data stored in it. By looping you're rewriting temporarydf, the times you are looping. You could just try excluding the for loop and the contents outside it.