I'm adapting the following script from the Keras documentation (https://keras.io/examples/audio/ctc_asr/). (mainly I modified the first CONV2D to a CONV1D for my data)
My dataset consists of a list of numeric arrays (of variable length) and a string of characters (of variable length) which should be predicted based on the array of numbers. This is similar to a speech recognition (voice to sentences) approach.
Sin every row of my dataset has variable length I thought that the best way to implement this was using ragged tensors (as an alternative to adding "0" or " " white spaces to either column:
# My data
ints_feature = tf.ragged.constant(new_df.Numbers.tolist(), dtype=tf.float32) # Lists of numbers
strings_feature = tf.ragged.constant(new_df.Strings.tolist()) # Strings
# Create a dataset from the two features
dataset = tf.data.Dataset.from_tensor_slices((ints_feature, strings_feature))
An example could be:
import tensorflow as tf
# Generate sample data
data_numbers = [[1, 2, 3], [4, 5, 6, 7], [8, 9], [5,6,2,7], [1,9,3,4,5]]
data_strings = ['apple', 'orange','banana', 'grape', 'kiwi']
# Convert the data to ragged tensors
ragged_numbers = tf.ragged.constant(data_numbers, dtype=tf.float32)
ragged_strings = tf.ragged.constant(data_strings)
# Create a dataset from the ragged tensors and labels
dataset = tf.data.Dataset.from_tensor_slices((ragged_numbers, ragged_strings))
# Print the dataset
for numbers, strings in dataset:
print("Numbers:", numbers.numpy(), "Strings:", strings.numpy())
I structured the model similar to the Keras example but I'm having trouble on the CTCLoss
function:
def CTCLoss(y_true, y_pred):
print(tf.cast(tf.shape(y_pred)[0], dtype="int64"))
print(tf.cast(tf.shape(y_true)[0], dtype="int64"))
print(tf.shape(y_pred))
print(tf.shape(y_true))
print(y_pred)
print(y_true)
# Compute the training-time loss value
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64") #RESOLVER!!!
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
return loss
I added the first 6 prints to get some insight of what's happening and I get the following:
Tensor("CTCLoss/Cast:0", shape=(), dtype=int64)
Tensor("CTCLoss/Cast_1:0", shape=(), dtype=int64)
Tensor("CTCLoss/Shape_2:0", shape=(3,), dtype=int32)
<DynamicRaggedShape lengths=[None, None] num_row_partitions=1>
Tensor("DeepSpeech_2/dense/Softmax:0", shape=(None, 1, 6), dtype=float32)
tf.RaggedTensor(values=Tensor("RaggedFromVariant/RaggedTensorFromVariant:1", shape=(None,), dtype=int64), row_splits=Tensor("RaggedFromVariant/RaggedTensorFromVariant:0", shape=(None,), dtype=int64))
When training the model I get this error:
ValueError: in user code:
File "/.../python3.9/site-packages/keras/src/engine/training.py", line 1377, in train_function *
return step_function(self, iterator)
File "/.../1562801909.py", line 14, in CTCLoss *
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64") #RESOLVER!!!
ValueError: Index 1 is not uniform
I'm guessing the error is asociated to different lengths of the y_pred
and y_true
, but I can't figure out how to arrange the shapes and dimensions (mainly because I'm using ragged tensors). I would gladly appreciate some help or perhaps a suggestion on another approach to solve this problem.
I tried trimming the number arrays and the strings to the shortest one in the dataset to have the same length in every row, this works perfectly, while using regular tensors.