I haven't used neural networks for many years, so excuse my ignorance. I was wondering what is the most appropriate way to train a LSTM model based on my dataset. I have 3 attributes as follows:
Attribute 1: small int e.g., [123, 321, ...]
Attribute 2: text sequence ['cgtaatta', 'ggcctaaat', ... ]
Attribute 3: text sequence ['ttga', 'gattcgtt', ... ]
Class label: binary [0, 1, ...]
The length of each sample's attributes (2 or 3) is arbitrary; therefore I do not want to use them as words rather as sequences (that's why I want to use RNN/LSTM models).
Is it possible to have more than one (sequence) inputs to the LSTM model (are there examples)? Or should I concatenate them into one e.g., input 1: ["123 cgtaatta ttga", 0]
You don't need to concatonate the inputs into one, that part is done using the
tf.keras.layers.Flatten()
layer, which takes multiple inputs and and flattens them without affecting the batch size.Read more here: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten
And here: https://www.tensorflow.org/tutorials/structured_data/time_series#multi-step_dense
Not sure about most appropriate way since I wondered here looking for my own answers, but I do know you need to classify the data by providing some numerical identities to the text if applicable in your case.
Hope this helps