I have a dataset of 40 feature vectors divided into 4 clases. Could somebody give an example code in Matlab how to apply deep belief network to do classification (and explaining parameters)? Arbitrary library/tooblox can be used, but should be in Matlab.
There is for example shogun toolbox (http://www.shogun-toolbox.org/), DeeBNet toolbox (http://ceit.aut.ac.ir/~keyvanrad/DeeBNet%20Toolbox.html) or the deep learning toolbox (http://www.mathworks.com/matlabcentral/fileexchange/38310-deep-learning-toolbox) but unfortunately all of them are not very well documented and because I'm totally new to deep learning / neral nets it is really hard for me.
Edit: What should I chose for the following parameters or over what range should I search?
nn.activation_function = 'tanh_opt'; % Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh).
nn.learningRate = 2; % learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs.
nn.momentum = 0.5; % Momentum
nn.scaling_learningRate = 1; % Scaling factor for the learning rate (each epoch)
nn.weightPenaltyL2 = 0; % L2 regularization
nn.nonSparsityPenalty = 0; % Non sparsity penalty
nn.sparsityTarget = 0.05; % Sparsity target
nn.inputZeroMaskedFraction = 0; % Used for Denoising AutoEncoders
nn.dropoutFraction = 0; % Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf)
nn.testing = 0; % Internal variable. nntest sets this to one.
nn.output = 'sigm'; % output unit 'sigm' (=logistic), 'softmax' and 'linear'
opts.numepochs = 1;
opts.batchsize = 100;
opts.momentum = 0;
opts.alpha = 1;
So DBN's are pretty complicated and it took me a few months to really wrap my head around them. Here's a quick overview though-
A neural network works by having some kind of features and putting them through a layer of "all or nothing activations". These activations have weights and this is what the NN is attempting to "learn". NNs kind of died in the 80-90's because the systems couldn't find these weights properly. This was until the awesome 2006 paper of Geoff Hinton - who thoughy to pretrain the network with a restricted boltzman machine to get the weights in the right ball park.
It depends on your goal, but if your goal is to learn how they work, I would start with Hinton's original paper and rewrite it to have functions instead of the static 3 layer network thats in the paper. This will give you a good intuition of whats going on in terms of the weights being learned and the activations.
Now to answer your second question- there's a bit of debate- but in my experience the most key factor is coming up with the architecture of the system these variables are as follows:
Other variables you can control are what I would classify as optimization variables. These are:
I'm going to warn you though, don't expect stellar results-- and be prepared to have a system that takes a long time to train.
A second route you could go is to try some other systems out there like Caffe and that might give you more usable results.
Anyways, good luck :)
ps, with such small data you might consider using SVMs instead.