Trying to check spelling whether it is correct or misspelled using WordNet. Here's the implementation SpellChecker.java done by me so far...
package com.domain.wordnet;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Collection;
import net.didion.jwnl.JWNL;
import net.didion.jwnl.JWNLException;
import net.didion.jwnl.data.IndexWord;
import net.didion.jwnl.data.IndexWordSet;
import net.didion.jwnl.data.Synset;
import net.didion.jwnl.dictionary.Dictionary;
public class SpellChecker {
private static Dictionary dictionary = null;
private static final String PROPS = "/opt/jwnl/jwnl14-rc2/config/file_properties.xml";
static {
try(InputStream is = new FileInputStream(PROPS)) {
JWNL.initialize(is);
dictionary = Dictionary.getInstance();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
System.out.println(isCorrect("change")); // true
System.out.println(isCorrect("changes")); // false
System.out.println(isCorrect("changed")); // true
System.out.println(isCorrect("changing")); // true
System.out.println();
System.out.println(isCorrect("analyze")); // true
System.out.println(isCorrect("analyzed")); // true
System.out.println(isCorrect("analyzing")); // false
}
public static boolean isCorrect(String token) {
try {
token = token.trim().toLowerCase();
IndexWordSet set = dictionary.lookupAllIndexWords(token);
if(set == null)
return false;
@SuppressWarnings("unchecked")
Collection<IndexWord> collection = set.getIndexWordCollection();
if(collection == null || collection.isEmpty())
return false;
for(IndexWord word : collection) {
Synset[] senses = word.getSenses();
if(senses != null && senses.length > 0
&& senses[0].toString().toLowerCase().contains(token)) {
return true;
}
}
return false;
} catch (JWNLException e) {
e.printStackTrace();
return false;
}
}
}
It is quite fine in most of the cases but you can see getting failed with plural and some ing forms. Can I avoid plural and ing forms anyhow without spoiling English language rules?
If you see, in the WordNet Browser, changes
is a valid word, but in Java APIs, it is not valid.
Don't know where I need to correct! Or any other good approach to overcome this issue?
The mistake you do here is in this loop
The line
Synset[] senses = word.getSenses()
returns all senses of the word, but you are checking only the first one (0-index). The word will be available in one of the senses. Something like thisAdding on to this, the ing forms of words may not be available as senses. I'm not sure why you want to search for the senses to decide its a valid word.
A code like
if(set.getLemma() != null) return true;
should be enough to decide the spell check right?