Define TensorFlow network key names according to an existing checkpoint

Question

Define TensorFlow network key names according to an existing checkpoint

886 views Asked by Damien Picard At 27 November 2017 at 08:30

I trained a LeNet-gray-28x28 image detection Tensorflow model using Nvidia DIGITS, giving me the results I expect. Now, I have to classify some images outside of DIGITS and I want to use the model I trained to.

So I get the LeNet model used by DIGITS and I create a class to use it :

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tflearn
from tflearn.layers.core import input_data


class LeNetModel():

    def gray28(self, nclasses):
        x = input_data(shape=[None, 28, 28, 1])
        # scale (divide by MNIST std)
        # x = x * 0.0125
        with slim.arg_scope([slim.conv2d, slim.fully_connected],
                            weights_initializer=tf.contrib.layers.xavier_initializer(),
                            weights_regularizer=slim.l2_regularizer(0.0005)):
            model = slim.conv2d(x, 20, [5, 5], padding='VALID', scope='conv1')
            model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool1')
            model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='conv2')
            model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool2')
            model = slim.flatten(model)
            model = slim.fully_connected(model, 500, scope='fc1')
            model = slim.dropout(model, 0.5, is_training=False, scope='do1')
            model = slim.fully_connected(model, nclasses, activation_fn=None, scope='fc2')

            return tflearn.DNN(model)

I downloaded my model from DIGITS and I instantiate it using (in another file) :

self.ballmodel = LeNetModel().gray28(2)
self.ballmodel.load("src/perftrack/prototype/models/ball/snapshot_5.ckpt")

But, when I launch my script, I get these exceptions :

2017-11-26 14:55:50.330524: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv1/biases not found in checkpoint
2017-11-26 14:55:50.330948: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key Global_Step not found in checkpoint
2017-11-26 14:55:50.331270: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key is_training not found in checkpoint
2017-11-26 14:55:50.331564: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv2/weights not found in checkpoint
2017-11-26 14:55:50.332823: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv1/weights not found in checkpoint
2017-11-26 14:55:50.332891: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key conv2/biases not found in checkpoint
2017-11-26 14:55:50.333620: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc2/weights not found in checkpoint
2017-11-26 14:55:50.334021: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc1/weights not found in checkpoint
2017-11-26 14:55:50.334173: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc1/biases not found in checkpoint
2017-11-26 14:55:50.334431: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key fc2/biases not found in checkpoint
...
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key conv1/biases not found in checkpoint
         [[Node: save_1/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_1/tensor_names, save_1/RestoreV2_1/shape_and_slices)]]
         [[Node: save_1/RestoreV2_1/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_38_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

So I use https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/inspect_checkpoint.py script to inspect the key names my checkpoint contains, and I get things like :

model/conv1/biases
model/conv2/weights
...

So I rewrite my network, adding the model/ prefix manually :

                model = slim.conv2d(x, 20, [5, 5], padding='VALID', scope='model/conv1')
                model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='model/pool1')
                model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='model/conv2')
                model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='model/pool2')
                model = slim.flatten(model)
                model = slim.fully_connected(model, 500, scope='model/fc1')
                model = slim.dropout(model, 0.5, is_training=False, scope='model/do1')
                model = slim.fully_connected(model, nclasses,

It fixes some of the missing keys warning but :

I sense that this is not the right way to fix it
I can't fix two keys :
1. Global_Step (I have a global_step key in my checkpoint)
2. is_training (I don't know what it is)

So my question is : how can I redefine these key names in my network to match the ones I find in my checkpoint ?

Original Q&A

There are 1 answers

**Damien Picard** · Accepted Answer · 2017-11-28T07:02:59+00:00

Because my question is mostly due to my bad understanding of TensorFlow, I do a trip on the official documentation, and I've found some answers.

Firstly, I combine the use of contrib/slim and contrib/tflearn and even if it is possible, it is not really relevant. So I rewrite the network using only slim :

import tensorflow as tf
import tensorflow.contrib.slim as slim


class LeNetModel():

    def gray28(self, nclasses):
        # x = input_data(shape=[None, 28, 28, 1])
        x = tf.placeholder(tf.float32, shape=[1, 28, 28], name="x")
        rs = tf.reshape(x, shape=[-1, 28, 28, 1])
        # scale (divide by MNIST std)
        # x = x * 0.0125
        with slim.arg_scope([slim.conv2d, slim.fully_connected],
                            weights_initializer=tf.contrib.layers.xavier_initializer(),
                            weights_regularizer=slim.l2_regularizer(0.0005)):
            model = slim.conv2d(rs, 20, [5, 5], padding='VALID', scope='conv1')
            model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool1')
            model = slim.conv2d(model, 50, [5, 5], padding='VALID', scope='conv2')
            model = slim.max_pool2d(model, [2, 2], padding='VALID', scope='pool2')
            model = slim.flatten(model)
            model = slim.fully_connected(model, 500, scope='fc1')
            model = slim.dropout(model, 0.5, is_training=True, scope='do1')
            model = slim.fully_connected(model, nclasses, activation_fn=None, scope='fc2')

            return x, model

I return the x placeholder and the model, and I use it to load the DIGITS pre-trained model (checkpoint) :

import tensorflow as tf
import tensorflow.contrib.slim as slim
import cv2
from models.lenet import LeNetModel

# Helper function to load/resize images
def image(path):
    img = cv2.imread(path, 0)
    return cv2.resize(img, dsize=(28,28))

# Define a function that adds the model/ prefix to all variables :
def name_in_checkpoint(var):
  return 'model/' + var.op.name

#Instantiate the model
x, model = LeNetModel().gray28(2)

# Define the variables to restore :
# Exclude the "is_training" that I don't care about
variables_to_restore = slim.get_variables_to_restore(exclude=["is_training"])
# Rename the other variables with the function name_in_checkpoint
variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore}

# Create a Saver to restore the checkpoint, given the variables
restorer = tf.train.Saver(variables_to_restore)

#Launch a session to restore the checkpoint and try to infer some images :
with tf.Session() as sess:
    # Restore variables from disk.
    restorer.restore(sess, "src/prototype/models/snapshot_5.ckpt")
    print("Model restored.")
    print(sess.run(model, feed_dict={x:[image("/home/damien/Vidéos/1/positives/img/1-img143.jpg")]}))
    print(sess.run(model, feed_dict={x:[image("/home/damien/Vidéos/0/positives/img/1-img1.jpg")]}))

And it works !

TechQA.

Define TensorFlow network key names according to an existing checkpoint

There are 1 answers

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in NVIDIA-DIGITS

Popular Questions

Popular Tags

Trending Questions