Many people have also faced this issue, but it alway seems to have happened because of some mistake in the command line argument

This is the command I'm running

!python "/content/drive/My Drive/Tensorflow/models/research/object_detection/model_main_tf2.py" --model_dir="/content/drive/My Drive/Tensorflow/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8" --pipeline_config_path="/content/drive/My Drive/Tensorflow/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config"

There doesn't seem to be any mistake in it.

This is the stack trace

    Traceback (most recent call last):
  File "/content/drive/My Drive/Tensorflow/models/research/object_detection/model_main_tf2.py", line 113, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/content/drive/My Drive/Tensorflow/models/research/object_detection/model_main_tf2.py", line 110, in main
    record_summaries=FLAGS.record_summaries)
  File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 630, in train_loop
    manager.save()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_management.py", line 819, in save
    self._record_state()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_management.py", line 728, in _record_state
    save_relative_paths=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_management.py", line 248, in update_checkpoint_state_internal
    text_format.MessageToString(ckpt))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 570, in atomic_write_string_to_file
    rename(temp_pathname, filename, overwrite)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 529, in rename
    rename_v2(oldname, newname, overwrite)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 546, in rename_v2
    compat.as_bytes(src), compat.as_bytes(dst), overwrite)

Error message:

tensorflow.python.framework.errors_impl.FailedPreconditionError: /content/drive/My Drive/Tensorflow/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint.tmp91048f3bf67645619be6603094546de1; Is a directory

The error is raised from _pywrap_file_io.RenameFile(), where _pywrap_file_io is imported from tensorflow.python. I tried to look into the source code to find the problem, but I couldn't find it anywhere.

Could the problem have arraised because I'm running this on colab ?

Tensorflow version: 2.3 Python version: 3.6

Can someone please help me with this.

1

There are 1 answers

1
Harish Babu On BEST ANSWER

The problem was that the program was trying to create a file with the name "checkpoint" but there was a folder with the same name in the downloaded model. There are two ways to overcome this issue,

  1. Create a new folder and set its path as the argument for --model_dir
  2. Check if there is a folder named 'checkpoint', if there is, then change the folder name. In my case, I changed it to 'checkpoint0'.