While constructing a Variational Autoencoder in TensorFlow, I came across various implementations of Kullback–Leibler divergence:
tfp.distributions.kl_divergence, which implements the exact KL-divergence measure.tfp.layers.KLDivergenceAddLoss, which uses a Monte Carlo approximation.tfp.layers.KLDivergenceRegularizer, which seems similar in calculation and use.tf.keras.losses.KLDivergence, which uses the formulay_true * log(y_true / y_pred).
I use tfp.distributions to model both the prior and the posterior, so any of the above implementations are compatible. However, although I assume that (2) and (3) are identical, I primarily wonder whether these are comparable to (4).
Any knowledge on this matter is appreciated!