Using the BVLC reference AlexNet file, I have been training a CNN against a training set I created. In order to measure the progress of training, I have been using a rough method to approximate the accuracy against the training data. My batch size on the test net is 256. I have ~4500 images. I perform 17 calls to solver.test_nets[0].forward() and record the value of solver.test_nets[0].blobs['accuracy'].data (the accuracy of that forward pass). I take the average across these. My thought was that I was taking 17 random samples of 256 from my validation set and getting the accuracy of these random samplings. I would expect this to closely approximate the true accuracy against the entire set. However, I later went back and wrote a script to go through each item in my LMDB so that I could generate a confusion matrix for my entire test set. I discovered that the true accuracy of my model was significantly lower than the estimated accuracy. For example, my expected accuracy of ~75% dropped to ~50% true accuracy. This is a far worse result than I was expecting.
My assumptions match the answer given here.
Have I made an incorrect assumption somewhere? What could account for the difference? I had assumed that forward() function gathered a random sample, but I'm not so sure that was the case. blobs.['accuracy'].data returned a different result (though usually within a small range) everytime, so this is why I assumed this.
The
forward()
function from Caffe does not perform any random sampling, it will only fetch the next batch according to yourDataLayer
. E.g., in your caseforward()
will pass the next 256 images in your network. Performing this 17 times will pass sequentially17x256=4352
images.Check that the script that goes through your whole LMDB performs the same data pre-processing as during training.