Output data from PyBrain neural network doesn't show diversity

362 views Asked by At

I'm trying to develop a neural network using PyBrain, with the following specs:

1.) 3 Layers total

2.) 36 input neurons

3.) input neurons are linear

4.) hidden layer is sigmoid

5.) output layer is linear

6.) number of hidden neurons is set with num_hidden_layer

7.) number of output is controlled by num_output

The code is:

import pybrain
import csv
from pybrain.datasets.supervised import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
import sys


with open(sys.argv[1], 'r') as csvfile:
    reader = csv.reader(csvfile)
    header = reader.next()
    dataset = SupervisedDataSet(len(header[:-1]), 1)
    test = []
    count = 1
    for line in reader:
        temp = []
        for e in line:
            temp.append(e.replace(',', ''))
        dataset.addSample(temp[:-1], temp[-1])

tstdata, trndata = dataset.splitWithProportion(0.25)
n = pybrain.FeedForwardNetwork()

num_inputs = len(header[:-1])
num_hidden_layer = 5
num_output = 1
inLayer = pybrain.LinearLayer(num_inputs)
hiddenLayer = pybrain.SigmoidLayer(num_hidden_layer)
outLayer = pybrain.LinearLayer(num_output)

n.addInputModule(inLayer)
n.addModule(hiddenLayer)
n.addOutputModule(outLayer)

in_to_hidden = pybrain.FullConnection(inLayer, hiddenLayer)
hidden_to_out = pybrain.FullConnection(hiddenLayer, outLayer)
n.addConnection(in_to_hidden)
n.addConnection(hidden_to_out)

n.sortModules()

trainer = BackpropTrainer(n, dataset=trndata, momentum = 0.1, verbose=True, weightdecay=0.01, learningrate=0.3)

for x in range(150):
    trainer.trainEpochs(10)

print n.activateOnDataset(tstdata)

Our input CSV has 37 columns of numerical data, the last column is the column which we want to train our NN to predict.

When we run our NN, the output is just: [[-9.43679663] [-9.43678922] [-9.43679759] [-9.43679592] [-9.43679396] [-9.43679395] [-9.43679737]. This is the correct ballpark figure for what we expect (-8 to -10), but there is no variation. If we only read in 1/2 the data, we get the exact same output. Is this an error with our input data (e.g. bad distribution or range of values)? Or is this a problem with our NN?

0

There are 0 answers