Implementing Variational Auto Encoder using Tensoroflow_Probability NOT on MNIST data set

421 views Asked by At

I know there are many questions related to Variational Auto Encoders. However, this question in two aspects differs from the existing ones: 1) it is implemented using Tensforflow V2 and Tensorflow_probability; 2) It does not use MNIST or any other image data set.

As about the problem itself:

I am trying to implement VAE using Tensorflow_probability and Keras. and I want to train and evaluate it on some synthetic data sets --as part of my research. I provided the code below.

Although the implementation is done and during the training, the loss value decreases but once I want to evaluate the trained model on my test set I face different errors.

I am somehow confident that the issue is related to input/output shape but unfortunately I did not manage the solve it.

Here is the code:

import numpy as np
import tensorflow as tf
import tensorflow.keras as tfk
import tensorflow_probability as tfp
from tensorflow.keras import layers as tfkl
from sklearn.datasets import make_classification
from tensorflow_probability import layers as tfpl
from sklearn.model_selection import train_test_split


tfd = tfp.distributions


n_epochs = 5
    n_features = 2
    latent_dim = 1
    n_units = 4
    learning_rate = 1e-3
    n_samples = 400
    batch_size = 32

    # Generate synthetic data / load data sets
    x_in, y_in = make_classification(n_samples=n_samples, n_features=n_features, n_informative=2, n_redundant=0,
                                     n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=[0.5, 0.5],
                                     flip_y=0.01, class_sep=1.0, hypercube=True,
                                     shift=0.0, scale=1.0, shuffle=False, random_state=42)

    x_in = x_in.astype('float32')
    y_in = y_in.astype('float32')  # .reshape(-1, 1)

    x_train, x_test, y_train, y_test = train_test_split(x_in, y_in, test_size=0.4, random_state=42, shuffle=True)
    x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=42, shuffle=True)

    print("shapes:", x_train.shape, y_train.shape, x_test.shape, y_test.shape, x_val.shape, y_val.shape)

    prior = tfd.Independent(tfd.Normal(loc=[tf.zeros(latent_dim)], scale=1.), reinterpreted_batch_ndims=1)

    train_dataset = tf.data.Dataset.from_tensor_slices(x_train).batch(batch_size)

    valid_dataset = tf.data.Dataset.from_tensor_slices(x_val).batch(batch_size)

    test_dataset = tf.data.Dataset.from_tensor_slices(x_test).batch(batch_size)

    encoder = tf.keras.Sequential([
        tfkl.InputLayer(input_shape=[n_features, ], name='enc_input'),
        tfkl.Lambda(lambda x: tf.cast(x, tf.float32)),  # - 0.5
        tfkl.Dense(n_units, activation='relu', name='enc_dense1'),
        tfkl.Dense(int(n_units / 2), activation='relu', name='enc_dense2'),
        tfkl.Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim),
                   activation=None, name='mvn_triL1'),
        tfpl.MultivariateNormalTriL(
            # weight >> num_train_samples or some thing except 1 to convert VAE to beta-VAE
            latent_dim, activity_regularizer=tfpl.KLDivergenceRegularizer(prior, weight=1.), name='bottleneck'),
    ])

    decoder = tf.keras.Sequential([
        tfkl.InputLayer(input_shape=latent_dim, name='dec_input'),
        # tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
        # tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),
        tfpl.IndependentBernoulli([n_features], tfd.Bernoulli.logits, name='dec_output'),
    ])

    vae = tfk.Model(inputs=encoder.inputs, outputs=decoder(encoder.outputs), name='VAE')

    print("enoder:", encoder)
    print(" ")
    print("encoder.inputs:", encoder.inputs)
    print(" ")
    print(" encoder.outputs:",  encoder.outputs)
    print(" ")
    print("decoder:", decoder)
    print(" ")
    print("decoder:", decoder.inputs)
    print(" ")
    print("decoder.outputs:", decoder.outputs)
    print(" ")

    # negative log likelihood i.e the E_{S(eps)} [p(x|z)];
    # because the KL term was added in the last layer of the encoder, i.e., via activity_regularizer.
    # this loss function takes two arguments, namely the original data points x, and the output of the model,
    # which we call it rv_x (because it is a random variable)
    negloglik = lambda x, rv_x: -rv_x.log_prob(x)

    vae.compile(optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
                loss=negloglik,)

    vae.summary()

    history = vae.fit(train_dataset, epochs=n_epochs, validation_data=valid_dataset,)

    print("x.shape:", x_test.shape)
    x_hat = vae(x_test)

    print("original:")
    print(x_test)
    print(" ")
    print("Decoded Random Samples:")
    print(x_hat.sample())
    print(" ")
    print("Decoded Means:")
    print(x_hat.mean())


The Questions:

  1. With the above code I receive the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 80 values, but the requested shape has 160 [Op:Reshape]

  1. As far I know we can add as many layers as I want in the decoder model before its output layer --as it is done a convolutional VAEs, am I right?

  2. If I uncomment the following two lines of code in decoder:

# tfkl.Dense(n_units, activation='relu', name='dec_dense1'),
# tfkl.Dense(int(n_units * 2), activation='relu', name='dec_dense2'),

I see the following warnings and the upcoming error:

WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss. WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss. WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss. WARNING:tensorflow:Gradients do not exist for variables ['dec_dense1/kernel:0', 'dec_dense1/bias:0', 'dec_dense2/kernel:0', 'dec_dense2/bias:0'] when minimizing the loss.

And the error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 640 values, but the requested shape has 160 [Op:Reshape]

Now the question is why the decoder layers are not used during the training as it is mentioned in the warning.

PS, I also tried to pass the x_train, x_valid, x_test directly during the training and evaluation process but it does not help.

Any helps would be indeed appreciated.

0

There are 0 answers