I'm relatively new to Keras, and I'm trying to get some example code from Keras documentation running in a jupyter notebook. This is the example I'm working with:
I copied the code over to my notebook, however when I train the model, it runs for one epoch. At the end of that epoch, I get an error, as shown below.
I'm not sure how to go about debugging this considering all my code is from the example.
`Epoch 1/3
1463/1463 [==============================] - ETA: 0s - loss: 22.8407 - box_loss: 2.6877 - class_loss: 20.1530
---------------------------------------------------------------------------
UnknownError Traceback (most recent call last)
<ipython-input-17-8e8737ecac83> in <cell line: 1>()
----> 1 yolo.fit(
2 train_ds,
3 validation_data=val_ds,
4 epochs=3,
5 callbacks=[EvaluateCOCOMetricsCallback(val_ds, "model.h5")],
2 frames
/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py in result_fn(self, force)
208
209 def result_fn(self, force=False):
--> 210 py_func_result = tf.py_function(
211 self.result_on_host_cpu, inp=[force], Tout=obj.dtype
212 )
UnknownError: {{function_node __wrapped__EagerPyFunc_Tin_1_Tout_1_device_/job:localhost/replica:0/task:0/device:CPU:0}} InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_365_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [4,13,4] vs. shape[1] = [4,14,4] [Op:ConcatV2] name: concat
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
return func(device, token, args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 146, in __call__
outputs = self._call(device, args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 153, in _call
ret = self._func(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 205, in result_on_host_cpu
return tf.constant(obj_result(force), obj.dtype)
File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 256, in result
self._cached_result = self._compute_result()
File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 264, in _compute_result
_box_concat(self.ground_truths),
File "/usr/local/lib/python3.10/dist-packages/keras_cv/src/metrics/object_detection/box_coco_metrics.py", line 44, in _box_concat
result[key] = tf.concat([b[key] for b in boxes], axis=0)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py", line 5883, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__ConcatV2_N_365_device_/job:localhost/replica:0/task:0/device:CPU:0}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [4,13,4] vs. shape[1] = [4,14,4] [Op:ConcatV2] name: concat`
I'm expecting the model to train for three epochs. I tried adjusting the training dataset so it was divisible by the batch size, but that didn't help.
I had the same problem and after some searching I found that EvaluateCOCOMetricsCallback() is the cause of this particular problem. As recommended in the link below, I switched to keras_cv.callbacks.PyCOCOCallback() and it fixed it for me.
https://github.com/keras-team/keras-cv/issues/1994#issuecomment-1665896238