How to get the current ragged tensor dimension in CTCLoss calculation?

46 views Asked by At

I'm adapting the following script from the Keras documentation (https://keras.io/examples/audio/ctc_asr/). (mainly I modified the first CONV2D to a CONV1D for my data)

My dataset consists of a list of numeric arrays (of variable length) and a string of characters (of variable length) which should be predicted based on the array of numbers. This is similar to a speech recognition (voice to sentences) approach.

Sin every row of my dataset has variable length I thought that the best way to implement this was using ragged tensors (as an alternative to adding "0" or " " white spaces to either column:

# My data
ints_feature = tf.ragged.constant(new_df.Numbers.tolist(), dtype=tf.float32) # Lists of numbers
strings_feature = tf.ragged.constant(new_df.Strings.tolist()) # Strings

# Create a dataset from the two features
dataset = tf.data.Dataset.from_tensor_slices((ints_feature, strings_feature))

An example could be:

import tensorflow as tf

# Generate sample data
data_numbers = [[1, 2, 3], [4, 5, 6, 7], [8, 9], [5,6,2,7], [1,9,3,4,5]]
data_strings = ['apple', 'orange','banana', 'grape', 'kiwi']

# Convert the data to ragged tensors
ragged_numbers = tf.ragged.constant(data_numbers, dtype=tf.float32)
ragged_strings = tf.ragged.constant(data_strings)

# Create a dataset from the ragged tensors and labels
dataset = tf.data.Dataset.from_tensor_slices((ragged_numbers, ragged_strings))

# Print the dataset
for numbers, strings in dataset:
    print("Numbers:", numbers.numpy(), "Strings:", strings.numpy())

I structured the model similar to the Keras example but I'm having trouble on the CTCLoss function:

def CTCLoss(y_true, y_pred):
    
    print(tf.cast(tf.shape(y_pred)[0], dtype="int64"))
    print(tf.cast(tf.shape(y_true)[0], dtype="int64"))
    print(tf.shape(y_pred))
    print(tf.shape(y_true))
    print(y_pred)
    print(y_true)

    # Compute the training-time loss value
    batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
    input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
    label_length = tf.cast(tf.shape(y_true)[1], dtype="int64") #RESOLVER!!!

    input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
    label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")

    loss = keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
    return loss

I added the first 6 prints to get some insight of what's happening and I get the following:

Tensor("CTCLoss/Cast:0", shape=(), dtype=int64)
Tensor("CTCLoss/Cast_1:0", shape=(), dtype=int64)
Tensor("CTCLoss/Shape_2:0", shape=(3,), dtype=int32)
<DynamicRaggedShape lengths=[None, None] num_row_partitions=1>
Tensor("DeepSpeech_2/dense/Softmax:0", shape=(None, 1, 6), dtype=float32)
tf.RaggedTensor(values=Tensor("RaggedFromVariant/RaggedTensorFromVariant:1", shape=(None,), dtype=int64), row_splits=Tensor("RaggedFromVariant/RaggedTensorFromVariant:0", shape=(None,), dtype=int64))

When training the model I get this error:

ValueError: in user code:

    File "/.../python3.9/site-packages/keras/src/engine/training.py", line 1377, in train_function  *
        return step_function(self, iterator)
    File "/.../1562801909.py", line 14, in CTCLoss  *
        label_length = tf.cast(tf.shape(y_true)[1], dtype="int64") #RESOLVER!!!

    ValueError: Index 1 is not uniform

I'm guessing the error is asociated to different lengths of the y_pred and y_true, but I can't figure out how to arrange the shapes and dimensions (mainly because I'm using ragged tensors). I would gladly appreciate some help or perhaps a suggestion on another approach to solve this problem.

I tried trimming the number arrays and the strings to the shortest one in the dataset to have the same length in every row, this works perfectly, while using regular tensors.

0

There are 0 answers