Pytorch forecasting TimeSeriesDataSet Can't set min encoder length larger than 0

114 views Asked by At

I am trying to build a transformer for forecasting on my data. I have hourly data with no missing values or dates that's why I am using the datetime as index on my original dataframe. My goal is to use 72 hours lookback to predict 24 hours ahead. This is what my dataframe looks like.

|Date|Streamflow|Tmean|Tmin|Tmax|SolRad|Precipitation|windSp|
|2017-07-04 14:00:00|1520|13.58|13.2|14.0|0.0|0.0|0.0|
|2017-07-04 15:00:00|1520|13.85|13.5|14.7|0.0|0.0|0.0|
|2017-07-04 16:00:00|1520|14.15|13.9|14.4|1.7|0.0|0.0|
|2017-07-04 17:00:00|1446|14.25|14.2|14.3|0.0|0.0|0.0|
|2017-07-04 18:00:00|1592|14.26|14.1|14.4|0.0|0.0|0.0|

The issue I'm dealing with has probably something to do with the time_idx needed for TimeseriesDataset.


# Define features, target, and time index
# Including time_idx as a feature seemed necessary for the code to work
features = ['Precipitation', 'Tmean','Tmin','Tmax','SolRad','windSp','time_idx']
target = 'Streamflow'
time_idx = 'time_idx'
training_split =pd.to_datetime("2020-04-11")  # day for cutoff
data=df[df.index < training_split]


Since I can use the date as index I thought to define the time_idx in these two ways that both failed.

data['time_idx'] = range(len(data))
data["time_idx"] = (data.index - data.index.min()).total_seconds() // (3600 * 24)

Using these values I did not manage to load my data to the time series


min_prediction_length = 1
min_encoder_length=24

max_prediction_length = 24
max_encoder_length = 72

I get the error

AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags

By looking at the source code I figured that only thing that seemed to work was this setting that causes problems due to small encoder length

max_encoder_length=72,
min_encoder_length=0,
min_prediction_idx=0,
min_prediction_length=1,
max_prediction_length=24,

What am I doing wrong? Is my time index the problem ? I tried reproducing the source code to see whats wrong and also tried different indices but nothing seems to be working.

training = TimeSeriesDataSet(
    data=data,
    time_idx="time_idx",
    target="Streamflow",
    group_ids=features,
    min_encoder_length=0, 
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    time_varying_known_reals=features,
    time_varying_unknown_reals=["Streamflow"],
    target_normalizer=TorchNormalizer(),
    add_relative_time_idx=False,
    add_target_scales=True,
    add_encoder_length=True
)
0

There are 0 answers