I want to have a many-to-one setting in LSTM using CNTK, i.e Each word in a sentence is a input, and a label per sentence is the output. Hence it is a mapping from many inputs to one output. The example provided in the CNTK Github repository, however, is many-to-many. I am having some trouble understanding the change in input format that has to be done for my application. In the example provided, each word in a sentence has a label associated with it, whereas in my application I want to have a label for a sentence.
Would it be correct to assign the sentence label I have, to all the words in that sentence? Is there is a better alternative approach?
This page shows how to take the outputs of an lstm and compute a learnable convex combination of them (also known as attention).
Update: As for the input format you can do it in different ways. If you use the builtin reader you can put the label in the first element of the sequence like in this example. If you feed the data from Python this other thread is relevant.