I'm trying to train a neural net to learn the function y = x1 + x2 + x3
. The objective is to play around with Caffe in order to learn and understand it better. The data required are synthetically generated in python and written to memory as an lmdb database file.
Code for data generation:
import numpy as np
import lmdb
import caffe
Ntrain = 100
Ntest = 20
K = 3
H = 1
W = 1
Xtrain = np.random.randint(0,1000, size = (Ntrain,K,H,W))
Xtest = np.random.randint(0,1000, size = (Ntest,K,H,W))
ytrain = Xtrain[:,0,0,0] + Xtrain[:,1,0,0] + Xtrain[:,2,0,0]
ytest = Xtest[:,0,0,0] + Xtest[:,1,0,0] + Xtest[:,2,0,0]
env = lmdb.open('expt/expt_train')
for i in range(Ntrain):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = Xtrain.shape[1]
datum.height = Xtrain.shape[2]
datum.width = Xtrain.shape[3]
datum.data = Xtrain[i].tobytes()
datum.label = int(ytrain[i])
str_id = '{:08}'.format(i)
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
env = lmdb.open('expt/expt_test')
for i in range(Ntest):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = Xtest.shape[1]
datum.height = Xtest.shape[2]
datum.width = Xtest.shape[3]
datum.data = Xtest[i].tobytes()
datum.label = int(ytest[i])
str_id = '{:08}'.format(i)
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
Solver.prototext file:
net: "expt/expt.prototxt"
display: 1
max_iter: 200
test_iter: 20
test_interval: 100
base_lr: 0.000001
momentum: 0.9
# weight_decay: 0.0005
lr_policy: "inv"
# gamma: 0.5
# stepsize: 10
# power: 0.75
snapshot_prefix: "expt/expt"
snapshot_diff: true
solver_mode: CPU
solver_type: SGD
debug_info: true
Caffe model:
name: "expt"
layer {
name: "Expt_Data_Train"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "expt/expt_train"
backend: LMDB
batch_size: 1
}
}
layer {
name: "Expt_Data_Validate"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
data_param {
source: "expt/expt_test"
backend: LMDB
batch_size: 1
}
}
layer {
name: "IP"
type: "InnerProduct"
bottom: "data"
top: "ip"
inner_product_param {
num_output: 1
weight_filler {
type: 'constant'
}
bias_filler {
type: 'constant'
}
}
}
layer {
name: "Loss"
type: "EuclideanLoss"
bottom: "ip"
bottom: "label"
top: "loss"
}
The loss on the test data that I'm getting is 233,655
. This is shocking as the loss is three orders of magnitude greater than the numbers in the training and test data sets. Also, the function to be learned is a simple linear function. I can't seem to figure out what is wrong in the code. Any suggestions/inputs are much appreciated.
The loss generated is a lot in this case because Caffe only accepts data (i.e.
datum.data
) in theuint8
format and labels (datum.label
) inint32
format. However, for the labels,numpy.int64
format also seems to be working. I thinkdatum.data
is accepted only inuint8
format because Caffe was primarily developed for Computer Vision tasks where inputs are images, which have RGB values in [0,255] range.uint8
can capture this using the least amount of memory. I made the following changes to the data generation code:After playing around with the net parameters (learning rate, number of iterations etc.) I'm getting an error of the order of 10^(-6) which I think is pretty good!