I have used the frame work provided by Daniel Nouri on his eponymous website. here is the code I used.It looks fine the only change I made is to change output_nonlinearity=lasagne.nonlinearities.softmax and regression to False.Otherwise it looks pretty straight forward
from lasagne import layers
import theano
from lasagne.updates import sgd,nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.metrics import classification_report
import lasagne
import cv2
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_mldata
import sys
mnist = fetch_mldata('MNIST original')
X = np.asarray(mnist.data, dtype='float32')
y = np.asarray(mnist.target, dtype='int32')
(trainX, testX, trainY, testY) = train_test_split(X,y,test_size =0.3,random_state=42)
trainX = trainX.reshape(-1, 1, 28, 28)
testX = testX.reshape(-1, 1, 28, 28)
clf = NeuralNet(
layers=[
('input', layers.InputLayer),
('conv1', layers.Conv2DLayer),
('pool1', layers.MaxPool2DLayer),
('dropout1', layers.DropoutLayer), # !
('conv2', layers.Conv2DLayer),
('pool2', layers.MaxPool2DLayer),
('dropout2', layers.DropoutLayer), # !
('hidden4', layers.DenseLayer),
('dropout4', layers.DropoutLayer), # !
('hidden5', layers.DenseLayer),
('output', layers.DenseLayer),
],
input_shape=(None,1, 28, 28),
conv1_num_filters=20, conv1_filter_size=(3, 3), pool1_pool_size=(2, 2),
dropout1_p=0.1, # !
conv2_num_filters=50, conv2_filter_size=(3, 3), pool2_pool_size=(2, 2),
dropout2_p=0.2, # !
hidden4_num_units=500,
dropout4_p=0.5, # !
hidden5_num_units=500,
output_num_units=10,
output_nonlinearity=lasagne.nonlinearities.softmax,
update=nesterov_momentum,
update_learning_rate=theano.shared(float32(0.03)),
update_momentum=theano.shared(float32(0.9)),
regression=False,
max_epochs=3000,
verbose=1,
)
clf.fit(trainX,trainY)
However on running it I get this NaN
input (None, 1, 28, 28) produces 784 outputs
conv1 (None, 20, 26, 26) produces 13520 outputs
pool1 (None, 20, 13, 13) produces 3380 outputs
dropout1 (None, 20, 13, 13) produces 3380 outputs
conv2 (None, 50, 11, 11) produces 6050 outputs
pool2 (None, 50, 6, 6) produces 1800 outputs
dropout2 (None, 50, 6, 6) produces 1800 outputs
hidden4 (None, 500) produces 500 outputs
dropout4 (None, 500) produces 500 outputs
hidden5 (None, 500) produces 500 outputs
output (None, 10) produces 10 outputs
epoch train loss valid loss train/val valid acc dur
------- ------------ ------------ ----------- ----------- ------
1 nan nan nan 0.09923 16.18s
2 nan nan nan 0.09923 16.45s
Thanks in advance.
I'm very late to the game, but hopefully someone finds this answer useful!
In my experience, there could be a number of things going wrong here. I'll write out my steps for debugging this kind of problem in nolearn/lasagne:
Using Theano's
fast_compile
optimizer can lead to underflow issues, which result in thenan
output (this was the ultimate problem in my case)When the output starts with
nan
values, or ifnan
values start appearing soon after training starts, the learning rate may be too high. If it is0.01
, try and make it0.001
.The input or output values may be too close to one another, and you may want to try scaling them. A standard approach is to scale the input by subtracting the mean and dividing by the standard deviation.
Make sure you are using
regression=True
when using nolearn with a regression problemTry using a linear output instead of softmax. Other nonlinearities sometimes also help, but in my experience not often.
If all this fails, try and isolate whether the issue is with your network or with your data. If you feed in random values within the expected range and still get
nan
output, it's probably not specific to the dataset you are training on.Hope that helps!