I have developed a v1 of a (2+1)D ResNet which takes in pixel data per frame as an input and is used to predict bounding box coordinates of up to 8 objects in that video. The shape of my current input is:
(batch_size, n_frames, height, width, channels)
And my output is of shape:
(n_frames, 32)
I am using Intersection over Union (IoU) as loss and am seeing some relatively poor results. I thought to increase this by increasing the number of features in the model (the dataset is quite small but it will increase in the future). The features I have extracted from my videos are:
- edges
- motion vectors
- color histograms
- optical flow
- textures
How do I utilise these features to get better predictions from my model?
My first step was to get my pixel data, a single feature and labels into a list. Then I created training, test and val splits. These were turned into datasets using a frame generator class.
I then created the following architecture:
input_shape = (None, None, HEIGHT, WIDTH, 4)
frames_input = layers.Input(shape=(None, HEIGHT, WIDTH, 3))
edges_input = layers.Input(shape=(None, HEIGHT, WIDTH, 1))
merged_input = layers.concatenate(\[frames_input, edges_input\], axis=-1)
# Reshape input tensor to include time dimension of varying length
x = layers.Reshape((-1, HEIGHT, WIDTH, 4))(merged_input)
x = Conv2Plus1D(filters=FILTERS, kernel_size=KERNAL_SIZE, padding='same')(x)
x = layers.BatchNormalization()(x)
x = layers.ReLU()(x)
x = ResizeVideo(HEIGHT // 2, WIDTH // 2)(x)
# Block 1
x = add_residual_block(x, 16, (3, 3, 3))
x = ResizeVideo(HEIGHT // 4, WIDTH // 4)(x)
# Block 2
x = add_residual_block(x, 32, (3, 3, 3))
x = ResizeVideo(HEIGHT // 8, WIDTH // 8)(x)
# Block 3
x = add_residual_block(x, 64, (3, 3, 3))
x = ResizeVideo(HEIGHT // 16, WIDTH // 16)(x)
# Block 4
x = add_residual_block(x, 128, (3, 3, 3))
# Apply TimeDistributed dense layer to output bounding box coordinates for each frame
x = TimeDistributed(layers.GlobalAveragePooling2D())(x) # Convert spatial dimensions to single dimension
x = TimeDistributed(layers.Dense(32))(x)
BoundingBoxV2_model = keras.Model([frames_input, edges_input], x)
And built the model like so:
sampled_frames, sampled_edges, sampled_labels = next(iter(train_ds)) BoundingBoxV2_model.build([sampled_frames, sampled_edges])
When i try to fit my model I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In\[149\], line 1
\----\> 1 history = BoundingBoxV2_model.fit(x = train_ds,
2 epochs = EPOCHS,
3 validation_data = val_ds)
File c:\\Users\\Rpiku\\miniconda3\\envs\\rally_stream\\lib\\site-packages\\keras\\utils\\traceback_utils.py:70, in filter_traceback.\<locals\>.error_handler(\*args, \*\*kwargs)
67 filtered_tb = \_process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
\---\> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File \~\\AppData\\Local\\Temp\__autograph_generated_file3rk3lb1s.py:15, in outer_factory.\<locals\>.inner_factory.\<locals\>.tf__train_function(iterator)
13 try:
14 do_return = True
\---\> 15 retval_ = ag_\_.converted_call(ag_\_.ld(step_function), (ag_\_.ld(self), ag_\_.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "c:\Users\Rpiku\miniconda3\envs\rally_stream\lib\site-packages\keras\engine\training.py", line 1160, in train_function *
...
File "c:\\Users\\Rpiku\\miniconda3\\envs\\rally_stream\\lib\\site-packages\\keras\\engine\\input_spec.py", line 216, in assert_input_compatibility
raise ValueError(
ValueError: Layer "model_8" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, 224, 224, 3) dtype=float32>]`