Temporal Fusion Transformer (Pytorch Forecasting): `hidden_size` parameter

1.4k views Asked by At

The Temporal-Fusion-Transformer (TFT) model in the PytorchForecasting package has several parameters (see: https://pytorch-forecasting.readthedocs.io/en/latest/_modules/pytorch_forecasting/models/temporal_fusion_transformer.html#TemporalFusionTransformer).

What does the hidden_size parameter exactly refer to? My best guess is that it refers to the number of neurons contained in the GRN component of the TFT. If so, in which layer are these neurons contained?

I found the documentation not really helpful in this case, since they describe the hidden_size parameter as: "hidden size of network which is its main hyperparameter and can range from 8 to 512"

Side note: part of my ignorance might be due to the fact that I am not fully familiar with the individual components of the TFT model.

1

There are 1 answers

0
ixaixim On

After a bit of research on the source code provided in the link, I was able to figure out how hidden_size is the main hyperparameter of the model. Here it is:

hidden_size describes indeed the number of neurons of each Dense layer of the GRN. You can check out the structure of the GRN at https://arxiv.org/pdf/1912.09363.pdf (page 6, Figure 2). Note that since the final layer of the GRN is just a normalization layer, also the output of the GRN has dimension hidden_size.

How is this the main hyperparameter of the model? By looking at the structure of the TFT model (on page 6 as well), the GRN unit appears in the Variable Selection process, in the Static Enrichment section and in the Position-wise Feed Forward section, so basically in every step of the learning process. Each one of these GRNs is built in the same way (only the input size varies).