While constructing a Variational Autoencoder in TensorFlow, I came across various implementations of Kullback–Leibler divergence:
tfp.distributions.kl_divergence
, which implements the exact KL-divergence measure.tfp.layers.KLDivergenceAddLoss
, which uses a Monte Carlo approximation.tfp.layers.KLDivergenceRegularizer
, which seems similar in calculation and use.tf.keras.losses.KLDivergence
, which uses the formulay_true * log(y_true / y_pred)
.
I use tfp.distribution
s to model both the prior and the posterior, so any of the above implementations are compatible. However, although I assume that (2) and (3) are identical, I primarily wonder whether these are comparable to (4).
Any knowledge on this matter is appreciated!