Training Hidden Markov Models without Tagged Corpus Data

Question

Training Hidden Markov Models without Tagged Corpus Data

1.2k views Asked by Claudiu At 16 December 2009 at 19:01

For a linguistics course we implemented Part of Speech (POS) tagging using a hidden markov model, where the hidden variables were the parts of speech. We trained the system on some tagged data, and then tested it and compared our results with the gold data.

Would it have been possible to train the HMM without the tagged training set?

Original Q&A

There are 2 answers

Matt Baker On 16 December 2009 at 19:28

NLP was a couple years ago, but I believe without tagging the HMM could help determine the symbol emission/state transition probabilities of n-grams (i.e. what are the odds of "world" occurring after "hello"), but not parts-of-speech. It needs the tagged corpus to learn how the POS interrelate.

If I'm way off on this let me know in the comments!

**bayer** · Accepted Answer · 2009-12-18T00:46:15+00:00

In theory you can do that. In that case you would use the Baum-Welch-Algorithm. It is described very well in Rabiner's HMM Tutorial.

However, having applied HMMs to part of speech, the error you get with the standard form will not be so satisfying. It is a form of expectation maximization which only converges to local maxima. Rule based approaches beat HMMs hands down, iirc.

I believe the natural language toolkit NLTK for python has an HMM implementation for that exact purpose.

TechQA.

Training Hidden Markov Models without Tagged Corpus Data

There are 2 answers

Related Questions in ARTIFICIAL-INTELLIGENCE

Related Questions in MACHINE-LEARNING

Related Questions in NLP

Related Questions in LINGUISTICS

Related Questions in MARKOV-MODELS

Popular Questions

Trending Questions