Pytorch Forecasting - TimeSeriesDataSet Assertion Error

70 views Asked by At

Im working on TFT model but I stuck with this error. I am trying to load custom dataset to TimeSeriesDataSet but I am getting this error. I also attached head of my dataset, actual dataset is much more bigger, so there is enough data to create series (over 1 million)

My core to create dataset;

time_category_columns = ['woy', 'dow', 'hod', 'moh', 'som']
not_real = ['timestamp', 'target', 'last_price', 'time_idx'] + time_category_columns
time_reals = [x for x in train.columns if x not in not_real]

training = TimeSeriesDataSet(
    train,
    time_idx="time_idx",
    target="target",
    group_ids=time_category_columns, 
    max_encoder_length=50,
    max_prediction_length=1,
    static_categoricals=[],
    time_varying_known_categoricals=time_category_columns,
    time_varying_known_reals=["time_idx"],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=time_reals,
    add_relative_time_idx=True,
    add_target_scales=True,
    add_encoder_length=True,
    allow_missing_timesteps=True,
)

This is the output and error message;

/usr/local/lib/python3.10/dist-packages/pytorch_forecasting/data/timeseries.py:1281: UserWarning: Min encoder length and/or min_prediction_idx and/or min prediction length and/or lags are too large for 1072897 series/groups which therefore are not present in the dataset index. This means no predictions can be made for those series. First 10 removed groups: [{'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '03'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '04'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '05'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '07'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '08'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '09'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '10'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '11'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '13'}, {'__group_id__woy': '5', '__group_id__dow': 'Friday', '__group_id__hod': '00', '__group_id__moh': '00', '__group_id__som': '14'}]
  warnings.warn(
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-18-62548dcd8bea> in <cell line: 2>()
      1 # Let's create a Dataset
----> 2 training = TimeSeriesDataSet(
      3     train,
      4     time_idx="time_idx",
      5     target="target",

1 frames
/usr/local/lib/python3.10/dist-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, constant_fill_strategy, allow_missing_timesteps, lags, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
    479 
    480         # create index
--> 481         self.index = self._construct_index(data, predict_mode=self.predict_mode)
    482 
    483         # convert to torch tensor for high performance data loading later

/usr/local/lib/python3.10/dist-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
   1288             )
   1289         assert (
-> 1290             len(df_index) > 0
   1291         ), "filters should not remove entries all entries - check encoder/decoder lengths and lags"
   1292 

AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

Example from my dataset; https://drive.google.com/file/d/1lj8G3x-ubwYs2o8bMztXlm1kOuQBP9hQ/view?usp=sharing

I tried different parameters and changing dataset. Also asked chatgpt, searched on web.

0

There are 0 answers