I am trying to train a bayesian NN for noisy time series prediction. I have problems in
- getting the model to learn the linear releationship in the data
- getting the model to learn the increasing noise The goal is that the model learns the noise and therefor is able to calculate an uncertainty within interference after learning. Data:
import numpy as np
import matplotlib.pyplot as plt
lookBack = 128
inputDim = 1
lookAHead = 32
split = 0.15
# create data
data_length = 10000
y_data = []
for ii in range(data_length):
y_data.append(0.5*ii + 2 + np.random.normal(0, ii*ii/100000))
x_data = np.arange(data_length)
#normalize data
y_data = np.array(y_data)
y_data = (y_data - np.mean(y_data)) / np.std(y_data)
plt.plot(x_data, y_data)
plt.show()
#construct input/target data
train_data_input = []
train_data_target = []
for ii in range(shape[0] - lookBack - lookAHead):
train_data_input.append(np.array(y_data[ii:ii+lookBack]))
train_data_target.append(y_data[ii+lookBack + lookAHead])
train_data_input = np.array(train_data_input)
train_data_input = np.expand_dims(train_data_input, axis = -1)
train_data_target = np.array(train_data_target)
My Model:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_probability as tfp
from tensorflow.keras.layers import Dense, Input
inputModel = Input(shape=(lookBack, inputDim))
features = keras.layers.Flatten()(inputModel)
features = layers.Dense(32, activation = "tanh")(features)
features = tfp.layers.DenseVariational(
units=8,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/(train_data_input.shape[0]*(1-split)),
activation="tanh",
)(features)
I tried different output types, just a regression value or a distribution that is learnt:
if output_type == "val":
outputs = layers.Dense(1)(features)
if output_type == "dist":
distribution_params = Dense(2) (features)
#c = np.log(np.expm1(1.0))
#outputs = tfp.layers.DistributionLambda(lambda t: tfp.distributions.Normal(loc=t[..., :1],
# scale=1e-3 + tf.math.softplus(c+0.05 * t[..., 1:])))(distribution_params)
outputs = tfp.layers.IndependentNormal(1)(distribution_params)
model = keras.Model(inputs=inputModel, outputs=outputs)
def negative_loglikelihood(targets, estimated_distribution):
return -estimated_distribution.log_prob(targets)
loss = None
if output_type == "dist"
loss = negative_loglikelihood
#def loss(y_true, y_pred):
# return tf.keras.losses.KLD(y_true, y_pred)
if output_type == "val":
loss = tf.keras.losses.MeanSquaredError()
model.compile(
optimizer=optimizer,
loss=loss
)
model.fit(x = train_data_input, y = train_data_target, epochs=25, validation_split=split)
I tried different unit sizes, loss functions (-log_pob, KLD) activation functions, (learnable) priors (1/2, see below), and posteriors (1/2, see below) but my model doesn´t really learn useful stuff. Furthermore I would expect the stdDev of my output to increase with increasing x as the noise is increasing too which is not the case.
def prior1(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = keras.Sequential(
[
tfp.layers.DistributionLambda(
lambda t: tfp.distributions.MultivariateNormalDiag(
loc=tf.zeros(n), scale_diag=tf.ones(n).
)
)
]
)
return prior_model
def prior2(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
c = np.log(np.expm1(1.0))
return tf.keras.Sequential([
tfp.layers.VariableLayer(2*n, dtype=dtype),
tfp.layers.DistributionLambda(lambda t: tfp.distributions.Independent(
tfp.distributions.Normal(loc=t[...,:n], scale=1e-3 + tf.math.softplus(c + 0.05 * t[..., n:])), reinterpreted_batch_ndims=1))
])
def posterior1(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = keras.Sequential(
[
tfp.layers.VariableLayer(
tfp.layers.MultivariateNormalTriL.params_size(n), dtype=dtype
),
tfp.layers.MultivariateNormalTriL(n),
]
)
return posterior_model
def posterior2(kernel_size, bias_size=0, dtype=None):
n = kernel_size + bias_size
c = np.log(np.expm1(1.0))
return keras.Sequential([
tfp.layers.VariableLayer(2 * n, dtype=dtype),
tfp.layers.DistributionLambda(lambda t: tfp.distributions.Independent(
tfp.distributions.Normal(loc=t[..., :n],
scale=1e-5 + tf.nn.softplus(c + t[..., n:])),
)),
])
Evaluation of the model:
predictions = []
means = []
stds = []
for ii in range(20):
prediction = model(train_data_input)
if output_type == "val":
predictions.append(prediction)
if output_type == "dist":
means.append(prediction.mean())
stds.append(prediction.stddev())
if output_type == "val":
predictions = np.array(predictions)
predicted_mean = np.mean(predictions, axis = 0)
predicted_std = np.std(predictions, axis = 0)
if output_type == "dist":
predicted_mean = np.mean(np.array(means), axis = 0)
predicted_std = np.mean(np.array(stds), axis = 0)
lower_bound = predicted_mean - predicted_std
upper_bound = predicted_mean + predicted_std
plt.plot(x_data, y_data)
#Plot the mean
plt.plot(np.arange(predicted_mean.shape[0]) + lookBack + lookAHead, np.squeeze(predicted_mean), color = "r")
#Plot the begging of the test set
plt.vlines(x=float(1-split)*predicted_mean.shape[0] + lookBack + lookAHead,ymin = np.min(predicted_mean), ymax = np.max(predicted_mean), colors = "g")
#Plot the hopefully increasing stdDev
plt.plot(np.arange(predicted_mean.shape[0]) + lookBack + lookAHead, predicted_std, color = "g")
#Plot mean +/- stdDev
plt.plot(np.arange(predicted_mean.shape[0]) + lookBack + lookAHead, np.squeeze(lower_bound), color = "r", alpha = 0.5)
plt.plot(np.arange(predicted_mean.shape[0]) + lookBack + lookAHead, np.squeeze(upper_bound), color = "r", alpha = 0.5)
Im thankful for any advice on how to get the models learning.
In the code below, a random input signal (
y_rv
) is generated from a known mean and std at each time point (y_mean
andy_std
). The model takes in a window from the input signal, and it learns to predict the mean and std that were used to sample that signal (i.e. input isy_rv
, and targets arey_mean
andy_std
).I started by simplifying model down to a single-output regression model predicting the mean. ReLU activation helped stabilise training. Once that basic model was working, I added in a second output for the standard deviation.
I modified the model and targets so that the model predicts a sequence of values up to and including 32 points into the future, rather than simply predicting a single vector at t=32. I think this gives the model more context when predicting standard deviation, rather than having to rely on a single data point. Adding dropout also helped the model stabilise predictions of standard deviation. Directly estimating the std worked better than estimating
log(std)
.Showing a single window from the training set:
Target and predictions:
This network only looks at the input features when making its prediction and doesn't use its previous predictions. Allowing the network to access its previous predictions would make it a recurrent net, which is the usual choice for time series.
GaussianProcessRegressor
insklearn
directly models the mean and std of a time series, and could provide a good baseline measure if you wanted to compare your NN against other techniques.Estimating mean and sd from the noisy signal (could be used for creating a training set):
Implementation using
torch.distributions
to learn the distribution.