Why TF-TRT converter didn't work for my model?

51 views Asked by At

I wanted to convert my trained model for better inference performance, by using TF-TRT. I used the nvidia tensorflow docker image, and had no problem with running test code.

Test code is from here: https://github.com/jhson989/tf-to-trt

and Detail Docker Image tag: nvcr.io/nvidia/tensorflow:23.12-tf2-py3

But when I tried to convert my trained model, it didn't work.

import tensorflow as tf
from tensorflow import keras
from tensorflow.python.compiler.tensorrt import trt_convert as trt

# The trained model is .h5 format
h5_model_path = 'model/path/h5/model_name'
h5_model = keras.models.load_model(model_path, compile=False)

# Need to convert .h5 to saved_model format for using TF-TRT
saved_model_path = 'model/path/saved_model/model_name'
tf.saved_model.save(h5_model, saved_model_path)

# Make a Converter
conversion_param = trt.TrtConversionParams(precision_mode=trt.TrtPrecisionMode.FP16)
converter = trt.TrtGraphConverterV2(input_saved_model_dir=saved_model_path, conversion_params=conversion_param)

# Error occurs from here
converter.convert()

And this error occurred.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 92, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern))
RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /model/path/saved_model/model_name/variables/variables

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 1031, in load_partial
    loader = Loader(object_graph_proto, saved_model_proto, export_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 226, in __init__
    self._restore_checkpoint()
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 561, in _restore_checkpoint
    load_status = saver.restore(variables_path, self._checkpoint_options)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/checkpoint/checkpoint.py", line 1415, in restore
    reader = py_checkpoint_reader.NewCheckpointReader(save_path)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 96, in NewCheckpointReader
    error_translator(e)
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 31, in error_translator
    raise errors_impl.NotFoundError(None, None, error_message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /model/path/saved_model/model_name/variables/variables

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/model/code/convert_model.py", line 106, in eval
    converter.convert()
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py", line 1453, in convert
    self._saved_model = load.load(self._input_saved_model_dir,
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 900, in load
    result = load_partial(export_dir, None, tags, options)["root"]
  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/saved_model/load.py", line 1034, in load_partial
    raise FileNotFoundError(
FileNotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /model/path/saved_model/model_name/variables/variables
 You may be trying to load on a different device from the computational device. Consider setting the `experimental_io_device` option in `tf.saved_model.LoadOptions` to the io_device such as '/job:localhost'.

I already confirmed the saved_model version of my model has same directory inside with test code. Specifically '/model/path/saved_model/model_name/variables' directory, with variables.data-00000-of-00001 and variablevariables.index.

1

There are 1 answers

0
Carpriccio_jh On

I solved this problem by changing the model's name.

I used my model name which consists of the model's result(e.g. val_acc, val_loss...) as saved_model format model's name.

I don't know what exactly happened.

But when I change the model's name to f'save_{idx}' (or something simple.), it works.