This is a follow-up question from this Github issue. To cut a long story short, I tried to use the Tensorflow Object detection API with my own dataset. Everything was working just fine until all of a sudden it crashed with the following error messages :

...
INFO:tensorflow:global step 10635: loss = 0.3392 (0.822 sec/step)
INFO:tensorflow:global step 10636: loss = 0.3529 (0.823 sec/step)
INFO:tensorflow:global step 10637: loss = 0.3305 (0.831 sec/step)
2017-09-14 20:02:02.545415: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_5, SparseToDense_1, Shape_2, Merge_1, Shape, Merge_2, Shape_3, SparseToDense_5, Shape_8, SparseToDense_3, Shape_6, Cast_1, Shape_1, Cast_2, Shape_7, ExpandDims_5, Shape_4, Reshape_5, Shape_10, Reshape_6, Shape_9)]]
INFO:tensorflow:global step 10638: loss = 0.3599 (0.858 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,240,127,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_5, SparseToDense_1, Shape_2, Merge_1, Shape, Merge_2, Shape_3, SparseToDense_5, Shape_8, SparseToDense_3, Shape_6, Cast_1, Shape_1, Cast_2, Shape_7, ExpandDims_5, Shape_4, Reshape_5, Shape_10, Reshape_6, Shape_9)]]

G:\Tensorflow_section\models-master\object_detection>

At first I thought maybe there are some inconsistend images in my dataset, i.e there are some pngs saved as jpgs and vice versa, so I went and scanned all images in the dataset and fixed them. I used the following method for such a task:

private string CheckImagetype(Stream stream)
{
    string jpg = "FFD8";
    string bmp = "424D" ;
    string gif = "474946" ;
    string png = "89504E470D0A1A0A" ;
    string sig = "";

    stream.Seek(0, SeekOrigin.Begin);
    for (int i = 0; i < 8; i++)
    {
        sig += stream.ReadByte().ToString("X2");
        if (sig.Length == 4 && sig == jpg)
        {
            sig = "jpg";
            break;
        }
        else if(sig.Length == 4 && sig == bmp)
        {
            sig = "bmp";
            break;
        }
        else if (sig.Length == 6 && sig == gif)
        {
            sig = "gif";
            break;
        }
        else if (sig.Length == 16 && sig == png)
        {
            sig = "png";
            break;
        }
    }
    return sig;
}

I then used EmguCV for retrieving images depth/number of channels, in order to avoid any further issues rising from wrong depth! and then annotated the images abd created a new TFRecord all again and then started a new training session.

This is what I got again:

INFO:tensorflow:global step 1286: loss = 0.3639 (0.721 sec/step)
INFO:tensorflow:global step 1287: loss = 0.3752 (0.735 sec/step)
INFO:tensorflow:global step 1288: loss = 0.5850 (0.720 sec/step)
2017-09-16 00:11:15.037646: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,150,178,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,150,178,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape, SparseToDense, Shape_1, Merge_1, Shape_10, Merge_2, Shape_2, SparseToDense_5, Shape_8, SparseToDense_2, Shape_7, Cast_1, Shape_6, Cast_2, Shape_4, ExpandDims_5, Shape_3, Reshape_5, Shape_5, Reshape_6, Shape_9)]]
INFO:tensorflow:global step 1289: loss = 0.4018 (0.781 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,150,178,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape, SparseToDense, Shape_1, Merge_1, Shape_10, Merge_2, Shape_2, SparseToDense_5, Shape_8, SparseToDense_2, Shape_7, Cast_1, Shape_6, Cast_2, Shape_4, ExpandDims_5, Shape_3, Reshape_5, Shape_5, Reshape_6, Shape_9)]]

G:\Tensorflow_section\models-master\object_detection>

I used a random subset of my images(10K images instead of 300K) and again got the same error :

INFO:tensorflow:global step 2316: loss = 0.6428 (2.192 sec/step)
INFO:tensorflow:Recording summary at step 2316.
INFO:tensorflow:global step 2317: loss = 0.4036 (1.444 sec/step)
INFO:tensorflow:global step 2318: loss = 0.4111 (1.343 sec/step)
INFO:tensorflow:global step 2319: loss = 0.3914 (1.351 sec/step)
INFO:tensorflow:global step 2320: loss = 0.3794 (1.376 sec/step)
INFO:tensorflow:global step 2321: loss = 0.4056 (1.340 sec/step)
2017-09-16 20:03:42.148318: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\framework\op_kernel.cc:1192] Invalid argument: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,182,322,4]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,182,322,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_1, SparseToDense_2, Shape_7, Merge_1, Shape_2, Merge_2, Shape_8, SparseToDense, Shape_6, SparseToDense_5, Shape_10, Cast_1, Shape_4, Cast_2, Shape_9, ExpandDims_5, Shape_5, Reshape_5, Shape, Reshape_6, Shape_3)]]
INFO:tensorflow:global step 2322: loss = 0.4787 (1.391 sec/step)
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
  File "train.py", line 198, in <module>
    tf.app.run()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 194, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\trainer.py", line 296, in train
    saver=saver)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 767, in train
    sv.stop(threads, close_summary_writer=True)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\six.py", line 686, in reraise
    raise value
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run
    enqueue_callable()
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\client\session.py", line 1235, in _single_operation_run
    target_list_as_strings, status, None)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\Lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Master\Anaconda3\envs\anaconda35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape mismatch in tuple component 16. Expected [1,?,?,3], got [1,182,322,4]
         [[Node: batch/padding_fifo_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_FLOAT, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_INT64, DT_INT32, DT_BOOL, DT_INT32, DT_BOOL, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_STRING, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, Reshape_2, Shape_1, SparseToDense_2, Shape_7, Merge_1, Shape_2, Merge_2, Shape_8, SparseToDense, Shape_6, SparseToDense_5, Shape_10, Cast_1, Shape_4, Cast_2, Shape_9, ExpandDims_5, Shape_5, Reshape_5, Shape, Reshape_6, Shape_3)]]

G:\Tensorflow_section\models-master\object_detection>

The catch is, I don't have any images in my dataset with the reported shapes in the error messages at all.

Here is some complementary information:

  • OS Platform and Distribution: Windows 10 x64 1703, Build 15063.540
  • TensorFlow installed from (source or binary): binary (used pip install )
  • TensorFlow version (use command below): 1.3.0
  • Python version: 3.5.3
  • CUDA/cuDNN version: Cuda 8.0 /cudnn v6.0
  • GPU model and memory: GTX-1080 - 8G
1

There are 1 answers

2
Hossein On BEST ANSWER

TL;DR:
Use JPEGs only.

Longer explanation:
It seems in creating TFRecords, only JPEG images are supported and nowhere in the documentation this is indicated!

Also when you try to use other types, it does not issue any warnings or doesn't throw any exceptions and therefore people like me lose an immense amount of time debugging something that could be easily spotted and fixed in the first place. Anyway, converting all images to JPEG solved this weird issue.