I understand that the model uses previously trained Part of Speech tagging during its configuration stage. But what if most of the words are new, how would the parser decide its operation then?
How does a Transition-based Dependency parser decide which operation to do next in its configuration stage?
131 views Asked by Akash At
2
I'd like to flesh @Quantum's answer out into a detailed one as follows:
Before 2014 many parsers were depending on a manually designed set of feature templates, and such methods have two drawbacks: 1) they required a lot of expertise and are usually incomplete; 2) most of the runtime is consumed by the feature extraction part of the configuration stage. After Chen and Mannning published their paper, A Fast and Accurate Dependency Parser using Neural Networks, almost all parsers are relying on neural networks.
Let's see how Chen and Manning did the job.
As illustrated in the above diagram, the output of the neural network is a distribution after a softmax function, then it is a simple classification problem depending on some given information. The given information contains mainly three parts: the top 3 words on the stack and buffer, and the two leftmost/rightmost children of the top two words on the stack, and the leftmost and rightmost grandchildren; the POS tags of the above; and the arc labels of all children/grandchildren.
The inputs are embedded into a matrix and transformed by two matrices(and as shown in the picture a cube function) to become the logits and then the distribution of three elements atop of the network.
HTH :)
References: 1) A Fast and Accurate Dependency Parser using Neural Networks, 2) CMU Neural Nets for NLP 2017 (12): Transition-based Dependency Parsing