Dataframe skips 5 rows when index reaches x999 to x000

44 views Asked by At

When generating the big table of 1s timeframe (generally greater than 10000 rows) I found data shifted due to 5 missing rows (5m skipped) at 999-1000, 1999-2000, 2999-3000 and so on.

This also occurs with 1m timeframe (guess this may occur with 1h however not enough candles back to the past to test)

Part of the result I got is here (1s TF)

.
.
.
995  2020-06-05 21:46:35+07:00  9705.19  9706.02  9705.19  9706.02
996  2020-06-05 21:46:36+07:00  9706.02  9706.02  9706.02  9706.02
997  2020-06-05 21:46:37+07:00  9705.77  9706.02  9705.77  9706.02
998  2020-06-05 21:46:38+07:00  9706.02  9706.72  9706.02  9706.72
999  2020-06-05 21:46:39+07:00  9706.72  9706.72  9706.72  9706.72 **21:46:39** 
1000 2020-06-05 21:51:39+07:00  9698.76  9698.76  9698.76  9698.76 **21:51:39**(5m skipped)
1001 2020-06-05 21:51:40+07:00  9698.76  9698.76  9698.76  9698.76
1002 2020-06-05 21:51:41+07:00  9698.76  9698.76  9698.76  9698.76
1003 2020-06-05 21:51:42+07:00  9698.76  9698.76  9698.76  9698.76
1004 2020-06-05 21:51:43+07:00  9698.87  9698.88  9698.87  9698.88
1005 2020-06-05 21:51:44+07:00  9698.88  9698.88  9698.88  9698.88
.
.
.
1995 2020-06-05 22:08:14+07:00  9684.71  9684.71  9684.71  9684.71
1996 2020-06-05 22:08:15+07:00  9684.71  9684.71  9684.71  9684.71
1997 2020-06-05 22:08:16+07:00  9684.71  9684.71  9684.71  9684.71
1998 2020-06-05 22:08:17+07:00  9684.71  9684.71  9684.71  9684.71
1999 2020-06-05 22:08:18+07:00  9684.71  9684.71  9684.71  9684.71 **22:08:18**
2000 2020-06-05 22:13:18+07:00  9677.95  9677.95  9677.95  9677.95 **22:13:18**(5m skipped)
2001 2020-06-05 22:13:19+07:00  9677.95  9677.95  9677.95  9677.95
2002 2020-06-05 22:13:20+07:00  9677.66  9679.82  9677.66  9679.82
2003 2020-06-05 22:13:21+07:00  9679.82  9679.82  9679.82  9679.82
2004 2020-06-05 22:13:22+07:00  9679.82  9679.82  9679.82  9679.82
2005 2020-06-05 22:13:23+07:00  9679.82  9679.82  9679.82  9679.82
.
.
.

And, 1m TF

.
.
.
995  2020-06-06 14:05:00+07:00  9612.17  9617.92  9612.00  9617.41
996  2020-06-06 14:06:00+07:00  9617.75  9621.15  9615.25  9618.87
997  2020-06-06 14:07:00+07:00  9618.95  9618.96  9618.32  9618.50
998  2020-06-06 14:08:00+07:00  9618.36  9619.00  9617.04  9618.60
999  2020-06-06 14:09:00+07:00  9618.61  9624.30  9618.61  9624.30 **14:09:00**
1000 2020-06-06 14:14:00+07:00  9620.23  9620.48  9619.27  9620.05 **14:14:00**(5m skipped)
1001 2020-06-06 14:15:00+07:00  9619.72  9623.24  9615.46  9615.46
1002 2020-06-06 14:16:00+07:00  9615.41  9615.69  9613.98  9613.98
1003 2020-06-06 14:17:00+07:00  9613.50  9613.63  9609.43  9610.10
1004 2020-06-06 14:18:00+07:00  9610.10  9616.13  9610.10  9615.65
1005 2020-06-06 14:19:00+07:00  9615.91  9615.91  9612.09  9613.11
.
.
.

Does anyone encounter this issue before. Is this because I did anything wrong with the script?

def dataframe_details_func(df_ohlcv, TIMEFRAME, LIMIT):
    while(len(df_ohlcv)<LIMIT):
        from_ts = df_ohlcv[-1][0] + 300000
        new_ohlcv = exchange.fetch_ohlcv(PAIR, timeframe=TIMEFRAME, since=from_ts, limit=LIMIT)
        df_ohlcv.extend(new_ohlcv)

    df_ohlcv = pd.DataFrame(df_ohlcv, columns ['datetime','open','high','low','close','volume'])
    df_ohlcv['datetime']  = pd.to_datetime(df_ohlcv['datetime'], unit='ms')
    df_ohlcv.datetime = df_ohlcv.datetime.dt.tz_localize('UTC').dt.tz_convert('Asia/Bangkok')

    return df_ohlcv

df_ohlcv1S = dataframe_details_func(df_ohlcv1, TIMEFRAME1S, LIMIT1S)

pd.set_option('display.max_rows', None, 'display.max_columns', None)
print(df_ohlcv1S.loc[900:1200, ['datetime', 'open', 'high', 'low', 'close']])
1

There are 1 answers

1
Tim Roberts On

The problem is

        from_ts = df_ohlcv[-1][0] + 300000

That statement is literally saying "start this chunk 5 minutes after the end of the last chunk". You don't want the 300000 delta here. Perhaps 1000, to start at the next second.