RNN input and output Shape

48 views Asked by At

I’m trying to build an RNN with tf.keras to generate text. Let’s say I have 100 poems from Shakespeare with a max length of 50 words and I’m using 10k English words as my vocab dictionary. Thus, my input shape would be [100, 50, 10k] (by padding all the training samples to have 50 sequences each). Then let’s suppose one training sample is “I love Shakespeare”, then my output shape would be [1, 2, 10k] by shifting one word to the right. Then my final output shape would be [100, 49, 10k]. here is a screenshot of my model (batch size = 1). when I try to fit my model, accuracy is so high (about 0.99) even at the first epoch. what am I doing wrong?

my model is like this : X shape = [113, 149, 10K] Y = shape = [113, 148, 10K];

def CreatingModel(X, densor, lstm, n_a):
    m, T, n_x = np.shape(X)
    X = tfk.layers.Input(shape=(T, n_x))
    z1 = lstm(X)
    out = densor(z1)       
    RNN = tfk.Model(inputs=X, outputs=out)
    return RNN

X, text, vocab, T, Y = DataPrepare('dat.txt', 'vocab.txt')
m, T, n_x = np.shape(X)
n_a = 30
densor = tfk.layers.Dense(n_x, activation='softmax')
lstm = tfk.layers.LSTM(n_a, return_sequences = True)
RNN = CreatingModel(X, densor, lstm, n_a)
model summary
Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_6 (InputLayer)        [(None, 149, 1712)]       0         
                                                                 
 lstm_5 (LSTM)               (None, 149, 30)           209160    
                                                                 
 dense_5 (Dense)             (None, 149, 1712)         53072     
                                                                 
=================================================================
Total params: 262,232
Trainable params: 262,232
Non-trainable params: 0
_________________________________________________________________
1

There are 1 answers

0
Debi Prasad On

It is possible to get high accuracy on the first epoch in itself because of the complexity of your model. Perhaps, you are overfitting your model on the current dataset. May be try to take Sigmoid as the activation function instead of the Softmax activation function.
Looking at your model, it doesn't has multiple hidden layers so it shouldn't cause the overfitting of the model. Check your data once, may be there is quite a lot of redundancy in your dataset.
You can test the data hunger of your model by training it on less data and noting down the performance of your model(Kinda similar of what we do in case of SGD)