I need to develop a classifier which given an instance of a word (for example 'hard') in context, it determines which of its senses is the intended one from a file in XML format distributed with python NLTK. I have found that Weka is suitable for this, however, I am lost on the necessary steps required to do this.
I am assuming the following steps: Determing the relevant features to be used by the classifier , for example consider 1 word or 2 words before the actual word 'hard'. Can this be done by Weka or by for example Java? If by Java is there an example? Since I have no idea how to do it for Weka.
Then I just use Weka to get the results and train and test on the file?