Caffe: Extremely high loss while learning simple linear functions

Question

Caffe: Extremely high loss while learning simple linear functions

788 views Asked by Dr. Prasanna Date At 25 June 2015 at 15:48

I'm trying to train a neural net to learn the function y = x1 + x2 + x3. The objective is to play around with Caffe in order to learn and understand it better. The data required are synthetically generated in python and written to memory as an lmdb database file.

Code for data generation:

import numpy as np
import lmdb
import caffe

Ntrain = 100
Ntest = 20
K = 3
H = 1
W = 1

Xtrain = np.random.randint(0,1000, size = (Ntrain,K,H,W))
Xtest = np.random.randint(0,1000, size = (Ntest,K,H,W))

ytrain = Xtrain[:,0,0,0] + Xtrain[:,1,0,0] + Xtrain[:,2,0,0]
ytest = Xtest[:,0,0,0] + Xtest[:,1,0,0] + Xtest[:,2,0,0]

env = lmdb.open('expt/expt_train')

for i in range(Ntrain):
    datum = caffe.proto.caffe_pb2.Datum()
    datum.channels = Xtrain.shape[1]
    datum.height = Xtrain.shape[2]
    datum.width = Xtrain.shape[3]
    datum.data = Xtrain[i].tobytes()
    datum.label = int(ytrain[i])
    str_id = '{:08}'.format(i)

    with env.begin(write=True) as txn:
        txn.put(str_id.encode('ascii'), datum.SerializeToString())


env = lmdb.open('expt/expt_test')

for i in range(Ntest):
    datum = caffe.proto.caffe_pb2.Datum()
    datum.channels = Xtest.shape[1]
    datum.height = Xtest.shape[2]
    datum.width = Xtest.shape[3]
    datum.data = Xtest[i].tobytes()
    datum.label = int(ytest[i])
    str_id = '{:08}'.format(i)

    with env.begin(write=True) as txn:
        txn.put(str_id.encode('ascii'), datum.SerializeToString())

Solver.prototext file:

net: "expt/expt.prototxt"

display: 1
max_iter: 200
test_iter: 20
test_interval: 100

base_lr: 0.000001
momentum: 0.9
# weight_decay: 0.0005

lr_policy: "inv"
# gamma: 0.5
# stepsize: 10
# power: 0.75

snapshot_prefix: "expt/expt"
snapshot_diff: true

solver_mode: CPU
solver_type: SGD

debug_info: true

Caffe model:

name: "expt"


layer {
    name: "Expt_Data_Train"
    type: "Data"
    top: "data"
    top: "label"    

    include {
        phase: TRAIN
    }

    data_param {
        source: "expt/expt_train"
        backend: LMDB
        batch_size: 1
    }
}


layer {
    name: "Expt_Data_Validate"
    type: "Data"
    top: "data"
    top: "label"    

    include {
        phase: TEST
    }

    data_param {
        source: "expt/expt_test"
        backend: LMDB
        batch_size: 1
    }
}


layer {
    name: "IP"
    type: "InnerProduct"
    bottom: "data"
    top: "ip"

    inner_product_param {
        num_output: 1

        weight_filler {
            type: 'constant'
        }

        bias_filler {
            type: 'constant'
        }
    }
}


layer {
    name: "Loss"
    type: "EuclideanLoss"
    bottom: "ip"
    bottom: "label"
    top: "loss"
}

The loss on the test data that I'm getting is 233,655. This is shocking as the loss is three orders of magnitude greater than the numbers in the training and test data sets. Also, the function to be learned is a simple linear function. I can't seem to figure out what is wrong in the code. Any suggestions/inputs are much appreciated.

Original Q&A

There are 1 answers

**Dr. Prasanna Date** · Accepted Answer · 2015-06-26T20:03:38+00:00

The loss generated is a lot in this case because Caffe only accepts data (i.e. datum.data) in the uint8 format and labels (datum.label) in int32 format. However, for the labels, numpy.int64 format also seems to be working. I think datum.data is accepted only in uint8 format because Caffe was primarily developed for Computer Vision tasks where inputs are images, which have RGB values in [0,255] range. uint8 can capture this using the least amount of memory. I made the following changes to the data generation code:

Xtrain = np.uint8(np.random.randint(0,256, size = (Ntrain,K,H,W)))
Xtest = np.uint8(np.random.randint(0,256, size = (Ntest,K,H,W)))

ytrain = int(Xtrain[:,0,0,0]) + int(Xtrain[:,1,0,0]) + int(Xtrain[:,2,0,0])
ytest = int(Xtest[:,0,0,0]) + int(Xtest[:,1,0,0]) + int(Xtest[:,2,0,0])

After playing around with the net parameters (learning rate, number of iterations etc.) I'm getting an error of the order of 10^(-6) which I think is pretty good!

TechQA.

Caffe: Extremely high loss while learning simple linear functions

There are 1 answers

Related Questions in PYTHON

Related Questions in NEURAL-NETWORK

Related Questions in DEEP-LEARNING

Related Questions in CAFFE

Related Questions in LMDB

Popular Questions

Popular Tags

Trending Questions