Tensorflow fail with "Unable to get element from the feed as bytes." when attempting to restore checkpoint

9.3k views Asked by At

I am using Tensorflow r0.12.

I use google-cloud-ml locally to run 2 different training jobs. In the first job, I find good initial values for my variables. I store them in a V2-checkpoint.

When I try to restore my variables for using them in the second job :

import tensorflow as tf

sess = tf.Session()
new_saver = tf.train.import_meta_graph('../variables_pred/model.ckpt-10151.meta', clear_devices=True)
new_saver.restore(sess, tf.train.latest_checkpoint('../variables_pred/'))
all_vars = tf.trainable_variables()
for v in all_vars:
    print(v.name)

I got the following error message :

tensorflow.python.framework.errors_impl.InternalError: Unable to get element from the feed as bytes.

The checkpoint is created with these lines in the first job :

saver = tf.train.Saver()
saver.export_meta_graph(filename=os.path.join(output_dir, 'export.meta'))
saver.save(sess, os.path.join(output_dir, 'export'), write_meta_graph=False)

According to this answer, it could come from the absence of metadata file but I am loading the metadata file.

PS : I use the argument clear_devices=True because the device specifications generated by a launch on google-cloud-ml are quite intricated and I don't need to necessarily get the same dispatch.

3

There are 3 answers

0
Thibaut Loiseleur On BEST ANSWER

The error message was due to the absence of the file named "checkpoint" by inadvertency.

After the reintroduction of this file in the appropriate folder, it appears that the loading of the checkpoint is working.

Sorry for having omitted this key point.

1
Jeremy Lewi On

I think the problem could be that when you save the model you set write_meta_graph=False. As a result I don't think you are actually saving the graph so when you try to restore there is no graph to restore. Try setting write_meta_graph=True

0
bob sherlock On

The error message was also due to the mistakes in the file named "checkpoint" by inadvertency.

For examples, the folder which contains the models has been moved, but the value of "model_checkpoint_path:" in "checkpoint" still is old path.