I want to use weka for text classification for Persian text. But I have a problem.
Tokenizer, stoplist and stemmer in Persian is different from these in English. So I should use my stemmer, tokenizer and stoplist in weka's interface there is a soulution to use my own stoplist but there is no way to change stemmer and tokennizer.
I want to know is there anyway to change them without modify weka's source code?
Because I am new in java and I don't know how I should modify weka source code.
i find my answer!it's impossible do it without modify
weka's
source code i forced to modifyweka's
source code.i had so much trouble to do it .because i am new in java!and so i put a brief steps to modifyingweka's
code to help others : first you should set java environment variable that described in this link: http://www.ntu.edu.sg/home/ehchua/programming/howto/Environment_Variables.html and then instal ant that described in this links : http://ant.apache.org/bindownload.cgi and finally see this video to find out how should you modify weka 's code: http://www.youtube.com/watch?v=buCpG7uV_v4