Inferring missing data with Restricted Boltzmann Machines

2k views Asked by At

Similar to the netflix competition, assume we have a movie dataset with missing ratings. How would I modify RBM to allow it to deduce the missing values? In related papers, one straightforward way is to impute random values to the missing visible features. However, I'm skeptical about the reconstruction accuracy because it can depend on the initial values given to these missing visible nodes.

What do you suggest?

Thanks

3

There are 3 answers

1
kudkudak On BEST ANSWER

https://www.youtube.com/watch?v=laVC6WFIXjg , maybe this video will be of some help.

I think that sampling after imputing random values is a good idea. Hinton justifies this in this video. Also you can try to estimate prior, or to do many samples, or to make guesses based on some different method and then do the reconstruction.

In the video Hinton says that this method isn't very accurate indeed on itself, but when combined with matrix factorization (or other similar methods) can be very powerful.

0
Mel On

Actually the dependency on the initial values given to these missing visible nodes can be used to get some extra 2-5% of accuracy. You can run the RBM several times under different initializations and then average the results. Every ending state will get errors but they'll be different form each other. I tried it and kept improving it until the +/-20th initialization...

1
A.R.Ferguson On

The idea is to perform alternating Gibbs sampling but keeping the non-missing values fixed to the data values in the reconstruction update. Doing this until the missing values reach a stationary distribution in their Markov Chains and you know what the network's best guess as to what they ought to be is.