For a linguistics course we implemented Part of Speech (POS) tagging using a hidden markov model, where the hidden variables were the parts of speech. We trained the system on some tagged data, and then tested it and compared our results with the gold data.
Would it have been possible to train the HMM without the tagged training set?
In theory you can do that. In that case you would use the Baum-Welch-Algorithm. It is described very well in Rabiner's HMM Tutorial.
However, having applied HMMs to part of speech, the error you get with the standard form will not be so satisfying. It is a form of expectation maximization which only converges to local maxima. Rule based approaches beat HMMs hands down, iirc.
I believe the natural language toolkit NLTK for python has an HMM implementation for that exact purpose.