I want to do preprocessing of docs(wsdl files) using mallet in Eclipse. I want to generate feature vectors and perform classification using mallet and MaxEntropy. I am new in using mallet, Can anyone guide me in this regard.
Thanks
I want to do preprocessing of docs(wsdl files) using mallet in Eclipse. I want to generate feature vectors and perform classification using mallet and MaxEntropy. I am new in using mallet, Can anyone guide me in this regard.
Thanks
If you're referring to Web Services Description Language, I don't know of any specific workflows or packages designed for those documents. I suspect that you might want to create a set of features that combines text (from web service descriptions) and more "categorical" features, like URLs or URL patterns.
The way I would approach this problem is to create a separate package that reads WSDL files and writes out a file in a format that Mallet expects. This adapter could be written in whatever language you are most comfortable with. It would read all the files, get a parsed XML tree for each, extract text and certain other features, and output a file in Mallet's preferred tab-delimited, one-doc-per-line format.