Classification using Mallet and MaxEntropy

137 views Asked by At

I want to do preprocessing of docs(wsdl files) using mallet in Eclipse. I want to generate feature vectors and perform classification using mallet and MaxEntropy. I am new in using mallet, Can anyone guide me in this regard.

Thanks

1

There are 1 answers

4
David Mimno On

If you're referring to Web Services Description Language, I don't know of any specific workflows or packages designed for those documents. I suspect that you might want to create a set of features that combines text (from web service descriptions) and more "categorical" features, like URLs or URL patterns.

The way I would approach this problem is to create a separate package that reads WSDL files and writes out a file in a format that Mallet expects. This adapter could be written in whatever language you are most comfortable with. It would read all the files, get a parsed XML tree for each, extract text and certain other features, and output a file in Mallet's preferred tab-delimited, one-doc-per-line format.