Global operator along a single dimension in Keras?

243 views Asked by At

Let's say I have a dataset comprising greyscale videos. The length and size of each video can vary so I am representing the data in three dimensions via the following shape:

Batch size time y x channels
None None None None 1

I want to extract features (say 16 of them) from the temporal dimension while keeping the same spatial dimensions, which would give me the following output shape:

Batch size y x filters
None None None 16

Notably, the shape of the data has been reduced by one dimension. In my head, I should be able to accomplish this with a Conv3D operator (feature generation) followed by some aggregating operation (global/average pooling or some linear operator) over the time dimension only. The resulting shape would be (None, 1, None, None, 16) which I believe I could reduce to (None, None, None, 16) using this answer.

My problem is that I cannot figure out how to apply any of Keras's global operators along a single dimension. Since the size of the time dimension is unknown, I cannot specify the window size (unknown, 1, 1) for a MaxPooling or convolutional layer that would span the entire time dimension. On the other hand, the GlobalMaxPooling layer doesn't accept an argument for specifying which dimensions to operate on.

Do I have to implement some complicated custom layer for this, or does a solution already exist? I've taken a look at the Reshape layer with MaxPooling1D, but I run into the same problem of not knowing the size of the x and y dimensions to reassemble the spatial structure after the pooling operation.

1

There are 1 answers

3
Mr. For Example On

I believe you could do something like following if you don't want using custom layer:

x = Input(shape=(Batch_size, time, y, x, 1))
# Process data wrt time and spaciall info, as many layer as you want
x = Conv3D(filters_num_0, kernel_size_0, strides=1, padding="same", activation=activation_0)(x) # (Batch_size, time, y, x, filters_num_0)
x = Conv3D(filters_num_1, kernel_size_1, strides=1, padding="same", activation=activation_1)(x) # (Batch_size, time, y, x, filters_num_1)
# # Aggregate channel info
x = Conv3D(1, kernel_size_2, strides=1, padding="same", activation=activation_2)(x) # (Batch_size, time, y, x, 1)
# Reshape for dimension reduce
x = Reshape(target_shape=(Batch_size, y, x, time))(x) 
# Aggregate time info
x = Conv2D(16, kernel_size_3, strides=1, padding="same", activation=activation_3 ) # (Batch_size, y, x, 16)

Using custom layer to doing for global average pool across time:

class GlobalAvgPoolAcrossTime(layers.Layer):
    def __init__(self, **kwargs):
        super(GlobalAvgPoolAcrossTime, self).__init__(**kwargs)

    # (Batch_size, time, y, x, channels) -> (Batch_size, 1, y, x, channels)
    def call(self, inputs):
        return keras.backend.mean(inputs, axis=1, keepdims=True)

x = Input(shape=(Batch_size, time, y, x, 1))
# Process data wrt time and spaciall info, as many layer as you want
x = Conv3D(filters_num_0, kernel_size_0, strides=1, padding="same", activation=activation_0)(x) # (Batch_size, time, y, x, filters_num_0)
x = Conv3D(filters_num_1, kernel_size_1, strides=1, padding="same", activation=activation_1)(x) # (Batch_size, time, y, x, filters_num_1)
x = GlobalAvgPoolAcrossTime()(x) # (Batch_size, 1, y, x, filters_num_1)
# Reshape for dimension reduce
x = Reshape(target_shape=(Batch_size, y, x, filters_num_1))(x)
x = Conv2D(16, kernel_size_2, strides=1, padding="same", activation=activation_3 ) # (Batch_size, y, x, 16)