Optimizing pandas slicing operation in a loop

142 views Asked by At

This is my data structure, with Date Time being unique and used as index of a pandas dataframe with 700,000 rows with different dates.

| DateTime | Open | High | Low | Close | indicator |

2018-10-23 12:00:00 61.61   61.86   61.6    61.84   0
2018-10-23 12:05:00 61.82   61.98   61.76   61.98   0
2018-10-23 12:10:00 61.98   62.01   61.9    62.01   0
2018-10-23 12:15:00 62.05   62.15   62.01   62.02   0
2018-10-23 12:20:00 62.04   62.13   62.03   62.07   0
2018-10-23 12:25:00 62.08   62.19   62.05   62.19   1
2018-10-23 12:30:00 62.19   62.19   62.11   62.15   0
2018-10-23 12:35:00 62.13   62.24   62.12   62.22   1
2018-10-23 12:40:00 62.23   62.34   62.22   62.29   0
2018-10-23 12:45:00 62.3    62.37   62.21   62.25   0

I would like to slice the dataframe containing the rows only matching the following criteria- 1. the date of the rows must be the same as the row with indicator value of 1 2. only the rows going forward in time should be sliced

The code takes about 10-12 seconds to execute, is there of improving the time?

My Python code-

# data is a pandas dataframe as above

arr = []
temporarydf = data[data['indicator'] == 1]
for i in range(len(temporaryDF)):
    sliceddata = data[(data['DateTime'] > temporaryDF['DateTime'].iloc[i]) &
                                      (data['DateTime'].dt.date == 
                                       temporaryDF['DateTime'].iloc[i].date())]
    arr.append(sliceddata)

Thank you.

1

There are 1 answers

0
Arsh Kenia On

you are unnecessarily looping as data is a unique class of pandas which has all the data stored in it. By looping you're rewriting temporarydf, the times you are looping. You could just try excluding the for loop and the contents outside it.