How to build custom pandas.tseries.offsets class?

1.3k views Asked by At

I want to find a way to build a custom pandas.tseries.offsets class at 1 second frequency for trading hours. The main requirement here is that the time offset object would be smart enough to know the next second of '2015-06-18 16:00:00' would be '2015-06-19 09:30:00 or 09:30:01', and the time delta computed from these two time stamps would be exactly 1s (custom offset 1s, similar to BDay(1) for business day frequency) instead of the duration of closing hours.

The reason is that when plotting the pd.Series for intraday data across a few trading days, see the simulated example below, there are lots of 'step line' (linear interpolation) between close and next day open prices to represent the time duration of closing hours. Is there a way to get rid of this? I've look at the source codes of pandas.tseries.offsets and find pd.tseries.offsets.BusinessHour and pd.tseries.offsets.BusinessMixin may help but I don't know how to use those.

import pandas as pd
import numpy as np
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay

# set as 'constant' object shared by all codes in this script
BDAY_US = CustomBusinessDay(calender=USFederalHolidayCalendar())
sample_freq = '5min'
dates = pd.date_range(start='2015-01-01', end='2015-01-31', freq=BDAY_US).date
# exculde the 09:30:00 as it is included in the first time bucket
times = pd.date_range(start='09:30:00', end='16:00:00', freq=sample_freq).time[1:]
time_stamps = [dt.datetime.combine(date, time) for date in dates for time in times]
s = pd.Series(np.random.randn(len(time_stamps)).cumsum() + 100, index=time_stamps)

s.plot()

enter image description here

Another way I can think of to partially fix this problem is to first reset_index() to get the default consecutive integer index for each row, and then calculating the difference between consecutive integer index as time(in seconds) elapsed. Plotting integer index as x-axis and then relabel them to appropriate time labels. Could someone show me how to do it with matplotlib as well?

Thanks for Jeff's comments. I just check the on-line docs for BusinessHour() and find it may useful in my case. Another follow-up question(s): the BusinessHour is in hour frequency, is there a way to make it at 1s frequency? also, how to combine it with CustomBusinessDay object?

To use BusinessHour()

from pandas.tseries.offsets import *
bhour = BusinessHour(start='09:30', end='16:00')
time = pd.Timestamp('2015-06-18 15:00:00')
print(time)
2015-06-18 15:00:00
# hourly increment works nicely
print(time + bhour * 1)
2015-06-19 09:30:00
# but not at minute or second frequency
print(time + Minute(61))
2015-06-18 16:01:00
print(time + Second(60*60 + 1))
2015-06-18 16:00:01

Many many thanks, and any help would be highly appreciated.

1

There are 1 answers

0
J Richard Snape On BEST ANSWER

As I mention in the comment, you potentially have two different problems

  1. You need to be able to plot a business times only timeseries without the long linear interpolations.
  2. You need an object that can do datetime arithmetic (in seconds) ignoring non-business times

I've given a solution that will account for 1 as this seems to be your immediate issue. If you need 2, or both - let us know in comments:

1. Plot the points with business days adjacent

Most graphs in matplotlib can have index formatters applied to the axes via the ticker API. I'll adapt this example to suit your case

import pandas as pd
import numpy as np
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

# set as 'constant' object shared by all codes in this script
BDAY_US = CustomBusinessDay(calender=USFederalHolidayCalendar())
sample_freq = '5min'
dates = pd.date_range(start='2015-01-01', end='2015-01-31', freq=BDAY_US).date
# exculde the 09:30:00 as it is included in the first time bucket
times = pd.date_range(start='09:30:00', end='16:00:00', freq=sample_freq).time[1:]
time_stamps = [dt.datetime.combine(date, time) for date in dates for time in times]
s = pd.Series(np.random.randn(len(time_stamps)).cumsum() + 100, index=time_stamps)

data_length = len(s)
s.index.name = 'date_time_index'
s.name='stock_price'
s_new = s.reset_index()

ax = s_new.plot(y='stock_price') #plot the data against the new linearised index...

def format_date(x,pos=None):
    thisind = np.clip(int(x+0.5), 0, data_length-1)
    return s_new.date_time_index[thisind].strftime('%Y-%m-%d %H:%M:%S')

ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))

fig = plt.gcf()
fig.autofmt_xdate()

plt.show()

This gives an output as follows, first at zoomed out natural scale, second zoomed in so you can see the transition between Friday 16:00 and Monday 09:00

zoomed out time series

zoomed in time series over a weekend