I am trying to derive the conditional distribution of the visible variables, , for the Replicated Softmax Model (RSM) or equivalently, the Restricted Boltzmann Machine (RBM) for word counts, according to the paper: "Replicated Softmax: an Undirected Topic Model" by Salakhutdinov and Hinton.
Paper can be found at: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=B04C8D67D381B8106FF6FA4203A86264?doi=10.1.1.164.71&rep=rep1&type=pdf
However, despite all efforts, I've been unable to get how the conditional can turn out to be a softmax distribtution:
Also, I'm confused if is a 3D matrix and a 2D matrix or is it instead a 2D matrix and vector respectively. I believe it is the latter. Hoping someone can demonstrate the derivations.
I am looking to implement the RSM to do topic modelling in python's theano. I am aware that there are codes out there but I prefer to understand the derivation myself so that I can extend or optimize the codes without the risk of breaking the model.
p.s. apologies, this is a repost of https://math.stackexchange.com/questions/2085616/rbm-deriving-the-replicated-softmax-model-rsm but i did so as aren't as many mathstackexchange users.
After sometime I found out where I misunderstood things and managed to derive the equations. Please refer to math.stackexchange:
https://math.stackexchange.com/questions/2085616/rbm-deriving-the-replicated-softmax-model-rsm/2087272#2087272