Number of parameters in Tensorflow-Probability network using DenseVariational layers

619 views Asked by At

I cannot figure out why the model returned has in the second layer 189 parameters. The way I calculate them they should be more. Why is this happening?

The code is the following:

# Define the prior weight distribution -- all N(0, 1) -- and not trainable

def prior(kernel_size, bias_size, dtype = None):
    
    n = kernel_size + bias_size
    
    prior_model = Sequential([
        
        tfpl.DistributionLambda(
        
            lambda t: tfd.MultivariateNormalDiag(loc = tf.zeros(n)  ,  scale_diag = tf.ones(n)
                                                
                                                ))
        
    ])
    
    return(prior_model)

# Define variational posterior weight distribution -- multivariate Gaussian

def posterior(kernel_size, bias_size, dtype = None):
    
    n = kernel_size + bias_size
    
    posterior_model = Sequential([
        
        tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n)  , dtype = dtype),   # The parameters of the model are declared Variables that are trainable
        
        tfpl.MultivariateNormalTriL(n)  # The posterior function will return to the Variational layer that will call it a MultivariateNormalTril object that will have as many dimensions
                                        # as the parameters of the Variational Dense Layer.  That means that each parameter will be generated by a distinct Normal Gaussian shifted and scaled
                                        # by a mu and sigma learned from the data, independently of all the other weights.  The output of this Variablelayer will become the input to the
                                        # MultivariateNormalTriL object.
                                        # The shape of the VariableLayer object will be defined by the number of parameters needed to create the MultivariateNormalTriL object given
                                        # that it will live in a Space of n dimensions (event_size = n).  This number is returned by the tfpl.MultivariateNormalTriL.params_size(n)
        
        
    ])
    
    return(posterior_model)

# Create probabilistic regression with one hidden layer, weight uncertainty

model = Sequential([
    tfpl.DenseVariational(units=8,
                          input_shape=(1,),
                          make_prior_fn=prior,
                          make_posterior_fn=posterior,
                          kl_weight=1/x_train.shape[0],
                          activation='sigmoid'),
    tfpl.DenseVariational(units=tfpl.IndependentNormal.params_size(1),
                          make_prior_fn=prior,
                          make_posterior_fn=posterior,
                          kl_weight=1/x_train.shape[0]),
    tfpl.IndependentNormal(1)
])

def nll(y_true, y_pred):
    return -y_pred.log_prob(y_true)

model.compile(loss=nll, optimizer=RMSprop(learning_rate=0.005))
model.summary()

enter image description here

When it comes to the second layer we have 8 inputs (given that we have 8 outputs in the first layer) and 2 outputs. So we have 16 weights in total. Each has its mean and variance => 2 * 16 = 32 parameters.

Then we must calculate the free parameters in the Covariance matrix between the 32 parameters. We only consider the triangular matrix that includes the diagonal given the symmetry of the Covariance matrix. So we have (32**2 - 32)/2 + 32 = 528 parameters. But the model summary reports only 189 parameters.

1

There are 1 answers

0
Michael Glazunov On

There are 8 inputs and 2 outputs which means that there are 8 * 2 = 16 kernels and 2 biases which gives in total 18 parameters. Each of the parameters has its own mean, so we get 18 parameters for means, also there is a (18 * 17)/2 + 18 = 171 parameter for the lower part of the covariance matrix. So in total there will be 171 + 18 = 189 trainable parameters as correctly indicated by the model.summary().