I have a dataset with attributes a,b,c. I use an input window size of 7 for each attribute to predict the next value of a.
I did an autocorrelation and cross correlation between the attributes for different number of lags. They are well correlated up to lag 7, therefore an input window size of 7.
My dataset consists of 100000 data points. When I predict, the predicted values are shifted to the right by one with respect to the actual values.
I tried aggregating my dataset to have less data points and came with 30000 dtapoints. The same problem occurs.
I aggregated it once again to 2000 dtapoints and this time, there is no shift.
There are previous question on this topic such as NARX Neural network prediction? and Python ARIMA model, predicted values are shifted. The answers say that this problem occurs when delay (lag) are not well chosen. But in my case they are correlated as stated above. So why is this happening? Is it because of the large size of the dataset?
Note that I am using a java library named encog for this prediction task.