I implemented an LSTM with attention in Keras to reproduce this paper. The strange behavior is simple: I have an MSE loss function and an MAPE and MAE as metrics. During training the MAPE is exploding but the MSE and MAE seem to train normally:
Epoch 1/20
275/275 [==============================] - 191s 693ms/step - loss: 0.1005 - mape: 15794.8682 - mae: 0.2382 - val_loss: 0.0334 - val_mape: 24.9470 - val_mae: 0.1607
Epoch 2/20
275/275 [==============================] - 184s 669ms/step - loss: 0.0099 - mape: 6385.5464 - mae: 0.0725 - val_loss: 0.0078 - val_mape: 11.3268 - val_mae: 0.0803
Epoch 3/20
275/275 [==============================] - 186s 676ms/step - loss: 0.0025 - mape: 5909.3735 - mae: 0.0369 - val_loss: 0.0131 - val_mape: 14.9827 - val_mae: 0.1061
Epoch 4/20
275/275 [==============================] - 187s 678ms/step - loss: 0.0015 - mape: 4746.2788 - mae: 0.0278 - val_loss: 0.0142 - val_mape: 16.1894 - val_mae: 0.1122
Epoch 5/20
30/275 [==>...........................] - ETA: 2:38 - loss: 0.0012 - mape: 9.3647 - mae: 0.0246
The MAPE is exploding at the end of each epoch. What could be the cause of this specific behavior?
The MAPE is still decreasing with each epoch so is this not really an issue since it is not hindering the training process?
Your loss and MAPE are decreasing so it sounds good. But if you fear the high values in MAPE you can tell if there is a Y value near zero. Because MAPE is a percentage error.
MAPE results can be misleading. From Wikipedia: