Using Convolution Neural Net with Lasagne in Python error

579 views Asked by At

I have used the frame work provided by Daniel Nouri on his eponymous website. here is the code I used.It looks fine the only change I made is to change output_nonlinearity=lasagne.nonlinearities.softmax and regression to False.Otherwise it looks pretty straight forward

from lasagne import layers
import theano
from lasagne.updates import sgd,nesterov_momentum
from nolearn.lasagne import NeuralNet
from sklearn.metrics import classification_report
import lasagne
import cv2
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_mldata
import sys

mnist = fetch_mldata('MNIST original')
X = np.asarray(mnist.data, dtype='float32')
y = np.asarray(mnist.target, dtype='int32')

(trainX, testX, trainY, testY) = train_test_split(X,y,test_size =0.3,random_state=42)
trainX = trainX.reshape(-1, 1, 28, 28)
testX = testX.reshape(-1, 1, 28, 28)

clf = NeuralNet(
    layers=[
    ('input', layers.InputLayer),
    ('conv1', layers.Conv2DLayer),
    ('pool1', layers.MaxPool2DLayer),
    ('dropout1', layers.DropoutLayer),  # !
    ('conv2', layers.Conv2DLayer),
    ('pool2', layers.MaxPool2DLayer),
    ('dropout2', layers.DropoutLayer),  # !
    ('hidden4', layers.DenseLayer),
    ('dropout4', layers.DropoutLayer),  # !
    ('hidden5', layers.DenseLayer),
    ('output', layers.DenseLayer),
    ],
 input_shape=(None,1, 28, 28),
 conv1_num_filters=20, conv1_filter_size=(3, 3), pool1_pool_size=(2, 2),
 dropout1_p=0.1,  # !
 conv2_num_filters=50, conv2_filter_size=(3, 3), pool2_pool_size=(2, 2),
 dropout2_p=0.2,  # !
 hidden4_num_units=500,
 dropout4_p=0.5,  # !
 hidden5_num_units=500,

 output_num_units=10,

 output_nonlinearity=lasagne.nonlinearities.softmax,

 update=nesterov_momentum,

 update_learning_rate=theano.shared(float32(0.03)),
 update_momentum=theano.shared(float32(0.9)),

 regression=False,
 max_epochs=3000,
 verbose=1,
 )

clf.fit(trainX,trainY)

However on running it I get this NaN

input               (None, 1, 28, 28)       produces     784 outputs
conv1               (None, 20, 26, 26)      produces   13520 outputs
pool1               (None, 20, 13, 13)      produces    3380 outputs
dropout1            (None, 20, 13, 13)      produces    3380 outputs
conv2               (None, 50, 11, 11)      produces    6050 outputs
pool2               (None, 50, 6, 6)        produces    1800 outputs
dropout2            (None, 50, 6, 6)        produces    1800 outputs
hidden4             (None, 500)             produces     500 outputs
dropout4            (None, 500)             produces     500 outputs
hidden5             (None, 500)             produces     500 outputs
output              (None, 10)              produces      10 outputs
epoch    train loss    valid loss    train/val    valid acc  dur
-------  ------------  ------------  -----------  -----------  ------
  1           nan           nan          nan      0.09923  16.18s
  2           nan           nan          nan      0.09923  16.45s

Thanks in advance.

1

There are 1 answers

0
Herman Schaaf On

I'm very late to the game, but hopefully someone finds this answer useful!

In my experience, there could be a number of things going wrong here. I'll write out my steps for debugging this kind of problem in nolearn/lasagne:

  1. Using Theano's fast_compile optimizer can lead to underflow issues, which result in the nan output (this was the ultimate problem in my case)

  2. When the output starts with nan values, or if nan values start appearing soon after training starts, the learning rate may be too high. If it is 0.01, try and make it 0.001.

  3. The input or output values may be too close to one another, and you may want to try scaling them. A standard approach is to scale the input by subtracting the mean and dividing by the standard deviation.

  4. Make sure you are using regression=True when using nolearn with a regression problem

  5. Try using a linear output instead of softmax. Other nonlinearities sometimes also help, but in my experience not often.

  6. If all this fails, try and isolate whether the issue is with your network or with your data. If you feed in random values within the expected range and still get nan output, it's probably not specific to the dataset you are training on.

Hope that helps!