I am trying to train a sequence model using LSTM to regress an output signal given an input signal with the same length. The input/output pairs are short signal pieces (100~1000 timesteps) from a couple of longer signals (50,000 timesteps) in order to populate more training data.

My problem is that each of these long signals has its own distribution that differs from the rest. The difference is not dramatic but is significant. My LSTM model does not converge desirably when distributions are different. To check my model, I simulated some signals with the same dynamics but a constant distribution and my model works perfectly for the regression task.

My question is how to handle such a problem? I have seen some solutions such as selective sampling (keeping samples with similar distributions) or distribution mapping, but all are explained for regular neural networks. I am looking for some advice for sequence modeling when dealing with this problem.