How to deal with different sizes of sentences when giving them as input to a Neural Network?

638 views Asked by At

I am giving a sentence as input to a tree structured Neural Network, where the leaf nodes will be the word vectors of the words in the sentence.

That tree will be a binarized constituency(see the binary vs n-ary branching section) parse tree.

I am trying to develop a semantic representation of the sentence.

The problem is, that since each sentence will have a different parse tree and different lengths, each sentence will have a different neural network. Due to this ever changing structure of neural network, I can't train it.

But this paper develops a tree structured Neural network using constituency and dependency based parse trees-

1) Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, and Christopher Manning

And this paper uses CNNs to extract a semantic representation.

2) Recurrent Continuous Translation Models by Nal Kalchbrenner,and Phil Blunsom. This picture gives a rough idea.CSM

Possible Solution-
I can map the sentence vector to a fixed number of neurons and then use those neurons to create a tree structure. For example if there is the sentence length is 10 and the max sentence length is 20, I can create a fixed dimensionality layer of 20 neurons and then (in this particular case) the first word to the first 2 neurons, 2nd word to the 3rd and 4th neuron and so on. Dynamic mapping can be done based on the sentence length.

The weight matrix of the sentence layer to the fixed dimensionality layer will be fixed(the weights should be kept 1). No biases.

But I think there will be some problems in this representation- for example- If this sentence "I had a lovely icecream and a pastry for dessert." is mapped to the fixed dimensionality layer it will become "I I had had a a lovely lovely icecream icecream and and a a pastry pastry for for dessert dessert..". So that means that shorter sentences will have a more profound effect on the neural network compared to the longer sentences. This biasness towards shorter sentences should also create duplicate words in the output(of a sequence generator). Could someone correct me if I am wrong?

I would welcome more solutions, especially ones that do not remove the relationships that words have between them in a sentence.

I will be implementing this using theano and python. I am considering a layer based approach and using theano.scan to iterate over the layers to finally form the sentence representation.

1

There are 1 answers

0
o1lo01ol1o On

The papers you cited both use recurrent architectures to deal with variable data lengths but from the graphs you posted and described, I'm not sure if you want to use recurrent structures or not. If you don't, you'll need to pad the input to some fixed length vector based on your dataset. (This is likely to be a poor model.) If you do, I'd look at some of the theano framework (lasagne, keras, blocks) implementations of RNNs.