Convert Tensorflow model to Caffe model

17k views Asked by At

I would like to be able to convert a Tensorflow model to Caffe model.

I searched on google but I was able to find only converters from caffe to tensorflow but not the opposite.

Does anyone have an idea on how to do it?

Thanks, Evi

3

There are 3 answers

0
Jayant Agrawal On

As suggested in the comment by @Patwie, you have to do it manually by copying the weights layer by layer. For example, to copy the first conv layer weights from a tensorflow checkpoint to a caffemodel, you have to do something like following:

sess = tf.Session()
new_saver = tf.train.import_meta_graph("/path/to/checkpoint.meta")
what = new_saver.restore(sess, "/path/to/checkpoint")

all_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)

conv1 = all_vars[0]
bias1 = all_vars[1]

conv_w1, bias_1 = sess.run([conv1,bias1])

net = caffe.Net('path/to/conv.prototxt', caffe.TEST)

net.params['conv_1'][0].data[...] = conv_w1
net.params['conv_1'][1].data[...] = bias_1

...

net.save('modelfromtf.caffemodel')

Note1: This code has NOT been tested. I am not sure if this will work, but I think it should. Also, this is for one conv layer, only. In practice, you have to first analyse your tensorflow checkpoint to check which layer weights are at which index(print all_vars) and then copy each layer's weights individually.

Note2: Some automation can be done by iterating over the initial conv layers as they generally follow a set pattern (conv1->bn1->relu1->conv2->bn2->relu2...)

Note3: Tensorflow may further divide each layer weights into separate indices. For example: weights and biases are separated for a conv layer as shown above. Also, gamma, mean and variance are separated for batch normalisation layer.

0
sandeep.ganage On

You can use the utility MMDNN developed by Microsoft. MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models.

0
Fatality On

I've had the same problem and found a solution. The code can be found here (https://github.com/lFatality/tensorflow2caffe) and I've also documented the code in some Youtube videos.


Part 1 covers the creation of the architecture of VGG-19 in Caffe and tflearn (higher level API for TensorFlow, with some changes to the code native TensorFlow should also work).


In Part 2 the export of the weights and biases out of the TensorFlow model into a numpy file is described. In tflearn you can get the weights of a layer like this:

#get parameters of a certain layer
conv2d_vars = tflearn.variables.get_layer_variables_by_name(layer_name)
#get weights out of the parameters
weights = model.get_weights(conv2d_vars[0])
#get biases out of the parameters
biases = model.get_weights(conv2d_vars[1])

For a convolutional layer, the layer_name is Conv_2D. Fully-Connected layers are called FullyConnected. If you use more than one layer of a certain type, a raising integer with a preceding underscore is used (e.g. the 2nd conv layer is called Conv_2D_1). I've found these names in the graph of the TensorBoard. If you name the layers in your architecture definition, then these layer_names might change to the names you defined.

In native TensorFlow the export will need different code but the format of the parameters should be the same so subsequent steps should still be applicable.


Part 3 covers the actual conversion. What's critical is the conversion of the weights when you create the caffemodel (the biases can be carried over without change). TensorFlow and Caffe use different formats when saving a filter. While TensorFlow uses [height, width, depth, number of filters] (TensorFlow docs, at the bottom), Caffe uses [number of filters, depth, height, width] (Caffe docs, chapter 'Blob storage and communication'). To convert between the formats you can use the transpose function (for example: weights_of_first_conv_layer.transpose((3,2,0,1)). The 3,2,0,1 sequence can be obtained by enumerating the TensorFlow format (origin) and then switching it to the Caffe format (target format) while keeping the numbers at their specific variable.).
If you want to connect a tensor output to a fully-connected layer, things get a little tricky. If you use VGG-19 with an input size of 112x112 it looks like this.

fc1_weights = data_file[16][0].reshape((4,4,512,4096))
fc1_weights = fc1_w.transpose((3,2,0,1))
fc1_weights = fc1_w.reshape((4096,8192))

What you get from TensorFlow if you export the parameters at the connection between tensor and fully-connected layer is an array with the shape [entries in the tensor, units in the fc-layer] (here: [8192, 4096]). You have to find out what the shape of your output tensor is and then reshape the array so that it fits the TensorFlow format (see above, number of filters being the number of units in the fc-layer). After that you use the transpose-conversion you've used previously and then reshape the array again, but the other way around. While TensorFlow saves fc-layer weights as [number of inputs, number of outputs], Caffe does it the other way around.
If you connect two fc-layers to each other, you don't have to do the complex process previously described but you will have to account for the different fc-layer format by transposing again (fc_layer_weights.transpose((1,0)))

You can then set the parameters of the network using

net.params['layer_name_in_prototxt'][0].data[...] = weights
net.params['layer_name_in_prototxt'][1].data[...] = biases

This was a quick overview. If you want all the code, it's in my github repository. I hope it helps. :)


Cheers,
Fatality