I trained a LeNet-gray-28x28 image detection Tensorflow model using Nvidia DIGITS, giving me the results I expect. Now, I have to classify some images outside of DIGITS and I want to use the model I trained to.
So I get the LeNet model used by DIGITS and I create a class to use it :
import tensorflow as tf
import tensorflow.contrib.slim as slim
import tflearn
from tflearn.layers.core import input_data
class LeNetModel():
def gray28(self, nclasses):
x = input_data(shape=[None, 28, 28, 1])
# scale (divide by MNIST std)
# x = x * 0.0125
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_initializer=tf.contrib.layers.xavier_initializer(),
weights_regularizer=slim.l2_regularizer(0.0005)):
model = slim.conv2d(x, 20, [5, 5], padding='VALID', scope='conv1')
model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool1')
model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='conv2')
model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool2')
model = slim.flatten(model)
model = slim.fully_connected(model, 500, scope='fc1')
model = slim.dropout(model, 0.5, is_training=False, scope='do1')
model = slim.fully_connected(model, nclasses, activation_fn=None, scope='fc2')
return tflearn.DNN(model)
I downloaded my model from DIGITS and I instantiate it using (in another file) :
self.ballmodel = LeNetModel().gray28(2)
self.ballmodel.load("src/perftrack/prototype/models/ball/snapshot_5.ckpt")
But, when I launch my script, I get these exceptions :
2017-11-26 14:55:50.330524: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv1/biases not found in checkpoint
2017-11-26 14:55:50.330948: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key Global_Step not found in checkpoint
2017-11-26 14:55:50.331270: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key is_training not found in checkpoint
2017-11-26 14:55:50.331564: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv2/weights not found in checkpoint
2017-11-26 14:55:50.332823: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv1/weights not found in checkpoint
2017-11-26 14:55:50.332891: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv2/biases not found in checkpoint
2017-11-26 14:55:50.333620: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc2/weights not found in checkpoint
2017-11-26 14:55:50.334021: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc1/weights not found in checkpoint
2017-11-26 14:55:50.334173: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc1/biases not found in checkpoint
2017-11-26 14:55:50.334431: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc2/biases not found in checkpoint
...
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv1/biases not found in checkpoint
[[Node: save_1/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_1/tensor_names, save_1/RestoreV2_1/shape_and_slices)]]
[[Node: save_1/RestoreV2_1/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_38_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
So I use https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/inspect_checkpoint.py script to inspect the key names my checkpoint contains, and I get things like :
model/conv1/biases
model/conv2/weights
...
So I rewrite my network, adding the model/ prefix manually :
model = slim.conv2d(x, 20, [5, 5], padding='VALID', scope='model/conv1')
model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='model/pool1')
model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='model/conv2')
model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='model/pool2')
model = slim.flatten(model)
model = slim.fully_connected(model, 500, scope='model/fc1')
model = slim.dropout(model, 0.5, is_training=False, scope='model/do1')
model = slim.fully_connected(model, nclasses,
It fixes some of the missing keys warning but :
- I sense that this is not the right way to fix it
- I can't fix two keys :
- Global_Step (I have a global_step key in my checkpoint)
- is_training (I don't know what it is)
So my question is : how can I redefine these key names in my network to match the ones I find in my checkpoint ?
Because my question is mostly due to my bad understanding of TensorFlow, I do a trip on the official documentation, and I've found some answers.
Firstly, I combine the use of contrib/slim and contrib/tflearn and even if it is possible, it is not really relevant. So I rewrite the network using only slim :
I return the x placeholder and the model, and I use it to load the DIGITS pre-trained model (checkpoint) :
And it works !