I'm new in text categorization, i want to realize it with WEKA. Do I have to construct a supervised training set like the ARFF file below? I have to do it manually right? And after this, what do I have to do? use Naive Bayes Classifier to predict the category of the test set?
@relation test
@attribute text String
@attribute politics {yes,no}
@attribute religion {yes,no}
@attribute another_category {yes,no}
@data
"this is a text about politics",yes,no,no
"this text is about religion",no,yes,no
"this text mixes everything",yes,yes,yes
Once you have loaded your ARFF, you could apply a StringToWordVector to build your word list. From there, you could use a classifier (such as Naive Bayes) to predict your classes (you may need to filter the other attributes to ensure they are not used as inputs for the classifier also).
Hope this helps!