Hierarchical forecasting in python with scikit-hts leads to Invalid frequency error

484 views Asked by At

I'm working with scikit-hts. Here's some code with a small df and hierarchy to get us started (after pip install scikit-hts[auto_arima]:

import hts
import pandas as pd

hierarchy_df = hierarchy_df_test = pd.DataFrame({'date':['1998-01-01', '1998-02-01', '1998-03-01', '1998-04-01', '1998-05-01', '1998-06-01', '1998-07-01', '1998-08-01', '1998-09-01', '1998-10-01', '1998-11-01', '1998-12-01', '1999-01-01', '1999-02-01'], 'total': [21, 40, 31, 21, 29, 40, 30, 21, 24, 30, 40, 22, 32, 32], 'A':[10,20,15,10,14,20,16,10,12,16,20,10,18,16], 'B':[11,20,16,11,15,20,14,11,12,14,20,12,14,16]})

hierarchy = {'total': ['A', 'B']}

I want to convert the dates to a datetime object so I run

hierarchy_df['date'] = pd.to_datetime(hierarchy_df['date'], format='%Y-%m-%d')

Now I do the model fit with auto_arima and 'OLS' as the revision method:

model_ols_arima = hts.HTSRegressor(model='auto_arima', revision_method='OLS', n_jobs=0)
model_ols_arima = model_ols_arima.fit(hierarchy_df, hierarchy)

Everything runs smoothly until I try to predict:

pred_ols_arima = model_ols_arima.predict(steps_ahead=4)

At which point I get a 'ValueError: Invalid frequency: 1'.

Here's the full error:

TypeError                                 Traceback (most recent call last)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _get_predict_index(self, steps_ahead)
    369         try:
--> 370             start = self.nodes.item.index[-1] + timedelta(freq)
    371             end = self.nodes.item.index[-1] + timedelta(steps_ahead * freq)

TypeError: unsupported operand type(s) for +: 'int' and 'datetime.timedelta'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<command-2268273946901309> in <module>
----> 1 pred_ols_arima = model_ols_arima.predict(steps_ahead=4)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in predict(self, exogenous_df, steps_ahead, distributor, disable_progressbar, show_warnings, **predict_kwargs)
    350             self.hts_result.errors = (key, error)
    351             self.hts_result.residuals = (key, residual)
--> 352         return self._revise(steps_ahead=steps_ahead)
    353 
    354     def _revise(self, steps_ahead: int = 1) -> pandas.DataFrame:

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _revise(self, steps_ahead)
    361 
    362         revised_columns = list(make_iterable(self.nodes))
--> 363         revised_index = self._get_predict_index(steps_ahead=steps_ahead)
    364         return pandas.DataFrame(revised, index=revised_index, columns=revised_columns)
    365 

/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _get_predict_index(self, steps_ahead)
    374             start = self.nodes.item.index[-1] + freq
    375             end = self.nodes.item.index[-1] + (steps_ahead * freq)
--> 376             future = pandas.date_range(freq=freq, start=start, end=end)
    377 
    378         return self.nodes.item.index.append(future)

/databricks/python/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
   1067         freq = "D"
   1068 
-> 1069     dtarr = DatetimeArray._generate_range(
   1070         start=start,
   1071         end=end,

/databricks/python/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in _generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, closed)
    375                 "and freq, exactly three must be specified"
    376             )
--> 377         freq = to_offset(freq)
    378 
    379         if start is not None:

pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()

pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()

ValueError: Invalid frequency: 1

I did some research and it seems the problem lies in the frequency of date, but I'm unable to resolve the problem. This tutorial here more or less does the same thing (with a larger data set), but without the error. Any help would be much appreciated. Thanks!

1

There are 1 answers

0
jim On

I was able to resolve the error here by setting the date column as index:

hierarchy_df = hierarchy_df.set_index('date')

Once this was done, the predict() line ran without any errors.