I am trying to build a transformer for forecasting on my data. I have hourly data with no missing values or dates that's why I am using the datetime as index on my original dataframe. My goal is to use 72 hours lookback to predict 24 hours ahead. This is what my dataframe looks like.
|Date|Streamflow|Tmean|Tmin|Tmax|SolRad|Precipitation|windSp|
|2017-07-04 14:00:00|1520|13.58|13.2|14.0|0.0|0.0|0.0|
|2017-07-04 15:00:00|1520|13.85|13.5|14.7|0.0|0.0|0.0|
|2017-07-04 16:00:00|1520|14.15|13.9|14.4|1.7|0.0|0.0|
|2017-07-04 17:00:00|1446|14.25|14.2|14.3|0.0|0.0|0.0|
|2017-07-04 18:00:00|1592|14.26|14.1|14.4|0.0|0.0|0.0|
The issue I'm dealing with has probably something to do with the time_idx needed for TimeseriesDataset.
# Define features, target, and time index
# Including time_idx as a feature seemed necessary for the code to work
features = ['Precipitation', 'Tmean','Tmin','Tmax','SolRad','windSp','time_idx']
target = 'Streamflow'
time_idx = 'time_idx'
training_split =pd.to_datetime("2020-04-11") # day for cutoff
data=df[df.index < training_split]
Since I can use the date as index I thought to define the time_idx in these two ways that both failed.
data['time_idx'] = range(len(data))
data["time_idx"] = (data.index - data.index.min()).total_seconds() // (3600 * 24)
Using these values I did not manage to load my data to the time series
min_prediction_length = 1
min_encoder_length=24
max_prediction_length = 24
max_encoder_length = 72
I get the error
AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags
By looking at the source code I figured that only thing that seemed to work was this setting that causes problems due to small encoder length
max_encoder_length=72,
min_encoder_length=0,
min_prediction_idx=0,
min_prediction_length=1,
max_prediction_length=24,
What am I doing wrong? Is my time index the problem ? I tried reproducing the source code to see whats wrong and also tried different indices but nothing seems to be working.
training = TimeSeriesDataSet(
data=data,
time_idx="time_idx",
target="Streamflow",
group_ids=features,
min_encoder_length=0,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
time_varying_known_reals=features,
time_varying_unknown_reals=["Streamflow"],
target_normalizer=TorchNormalizer(),
add_relative_time_idx=False,
add_target_scales=True,
add_encoder_length=True
)