Why am I getting a concat error at the end of one epoch of training?

132 views Asked by At

I'm relatively new to Keras, and I'm trying to get some example code from Keras documentation running in a jupyter notebook. This is the example I'm working with:

Keras Computer Vision Example

I copied the code over to my notebook, however when I train the model, it runs for one epoch. At the end of that epoch, I get an error, as shown below.

I'm not sure how to go about debugging this considering all my code is from the example.

`Epoch 1/3
1463/1463 [==============================] - ETA: 0s - loss: 22.8407 - box_loss: 2.6877 - class_loss: 20.1530
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-17-8e8737ecac83> in <cell line: 1>()
----> 1 yolo.fit(
      2     train_ds,
      3     validation_data=val_ds,
      4     epochs=3,
      5     callbacks=[EvaluateCOCOMetricsCallback(val_ds, "model.h5")],

2 frames
/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py in result_fn(self, force)
    208 
    209         def result_fn(self, force=False):
--> 210             py_func_result = tf.py_function(
    211                 self.result_on_host_cpu, inp=[force], Tout=obj.dtype
    212             )

UnknownError: {{function_node __wrapped__EagerPyFunc_Tin_1_Tout_1_device_/job:localhost/replica:0/task:0/device:CPU:0}} InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_365_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [4,13,4] vs. shape[1] = [4,14,4] [Op:ConcatV2] name: concat
Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
    return func(device, token, args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 146, in __call__
    outputs = self._call(device, args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 153, in _call
    ret = self._func(*args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 205, in result_on_host_cpu
    return tf.constant(obj_result(force), obj.dtype)

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 256, in result
    self._cached_result = self._compute_result()

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 264, in _compute_result
    _box_concat(self.ground_truths),

  File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 44, in _box_concat
    result[key] = tf.concat([b[key] for b in boxes], axis=0)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py", line 5883, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_365_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [4,13,4] vs. shape[1] = [4,14,4] [Op:ConcatV2] name: concat`

I'm expecting the model to train for three epochs. I tried adjusting the training dataset so it was divisible by the batch size, but that didn't help.

1

There are 1 answers

0
Ufuk_Uzun On BEST ANSWER

I had the same problem and after some searching I found that EvaluateCOCOMetricsCallback() is the cause of this particular problem. As recommended in the link below, I switched to keras_cv.callbacks.PyCOCOCallback() and it fixed it for me.

https://github.com/keras-team/keras-cv/issues/1994#issuecomment-1665896238