I'm working with scikit-hts. Here's some code with a small df and hierarchy to get us started (after pip install scikit-hts[auto_arima]:
import hts
import pandas as pd
hierarchy_df = hierarchy_df_test = pd.DataFrame({'date':['1998-01-01', '1998-02-01', '1998-03-01', '1998-04-01', '1998-05-01', '1998-06-01', '1998-07-01', '1998-08-01', '1998-09-01', '1998-10-01', '1998-11-01', '1998-12-01', '1999-01-01', '1999-02-01'], 'total': [21, 40, 31, 21, 29, 40, 30, 21, 24, 30, 40, 22, 32, 32], 'A':[10,20,15,10,14,20,16,10,12,16,20,10,18,16], 'B':[11,20,16,11,15,20,14,11,12,14,20,12,14,16]})
hierarchy = {'total': ['A', 'B']}
I want to convert the dates to a datetime object so I run
hierarchy_df['date'] = pd.to_datetime(hierarchy_df['date'], format='%Y-%m-%d')
Now I do the model fit with auto_arima and 'OLS' as the revision method:
model_ols_arima = hts.HTSRegressor(model='auto_arima', revision_method='OLS', n_jobs=0)
model_ols_arima = model_ols_arima.fit(hierarchy_df, hierarchy)
Everything runs smoothly until I try to predict:
pred_ols_arima = model_ols_arima.predict(steps_ahead=4)
At which point I get a 'ValueError: Invalid frequency: 1'.
Here's the full error:
TypeError Traceback (most recent call last)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _get_predict_index(self, steps_ahead)
369 try:
--> 370 start = self.nodes.item.index[-1] + timedelta(freq)
371 end = self.nodes.item.index[-1] + timedelta(steps_ahead * freq)
TypeError: unsupported operand type(s) for +: 'int' and 'datetime.timedelta'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<command-2268273946901309> in <module>
----> 1 pred_ols_arima = model_ols_arima.predict(steps_ahead=4)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in predict(self, exogenous_df, steps_ahead, distributor, disable_progressbar, show_warnings, **predict_kwargs)
350 self.hts_result.errors = (key, error)
351 self.hts_result.residuals = (key, residual)
--> 352 return self._revise(steps_ahead=steps_ahead)
353
354 def _revise(self, steps_ahead: int = 1) -> pandas.DataFrame:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _revise(self, steps_ahead)
361
362 revised_columns = list(make_iterable(self.nodes))
--> 363 revised_index = self._get_predict_index(steps_ahead=steps_ahead)
364 return pandas.DataFrame(revised, index=revised_index, columns=revised_columns)
365
/local_disk0/.ephemeral_nfs/envs/pythonEnv-a9bf3a0d-a93e-453d-890b-d123f282d710/lib/python3.8/site-packages/hts/core/regressor.py in _get_predict_index(self, steps_ahead)
374 start = self.nodes.item.index[-1] + freq
375 end = self.nodes.item.index[-1] + (steps_ahead * freq)
--> 376 future = pandas.date_range(freq=freq, start=start, end=end)
377
378 return self.nodes.item.index.append(future)
/databricks/python/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in date_range(start, end, periods, freq, tz, normalize, name, closed, **kwargs)
1067 freq = "D"
1068
-> 1069 dtarr = DatetimeArray._generate_range(
1070 start=start,
1071 end=end,
/databricks/python/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py in _generate_range(cls, start, end, periods, freq, tz, normalize, ambiguous, nonexistent, closed)
375 "and freq, exactly three must be specified"
376 )
--> 377 freq = to_offset(freq)
378
379 if start is not None:
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()
pandas/_libs/tslibs/offsets.pyx in pandas._libs.tslibs.offsets.to_offset()
ValueError: Invalid frequency: 1
I did some research and it seems the problem lies in the frequency of date, but I'm unable to resolve the problem. This tutorial here more or less does the same thing (with a larger data set), but without the error. Any help would be much appreciated. Thanks!
I was able to resolve the error here by setting the date column as index:
Once this was done, the predict() line ran without any errors.