Spell checking for base word

236 views Asked by At

Trying to check spelling whether it is correct or misspelled using WordNet. Here's the implementation SpellChecker.java done by me so far...

package com.domain.wordnet;

import java.io.FileInputStream;
import java.io.InputStream;
import java.util.Collection;

import net.didion.jwnl.JWNL;
import net.didion.jwnl.JWNLException;
import net.didion.jwnl.data.IndexWord;
import net.didion.jwnl.data.IndexWordSet;
import net.didion.jwnl.data.Synset;
import net.didion.jwnl.dictionary.Dictionary;

public class SpellChecker {

    private static Dictionary dictionary = null;
    private static final String PROPS = "/opt/jwnl/jwnl14-rc2/config/file_properties.xml";

    static {
        try(InputStream is = new FileInputStream(PROPS)) {
            JWNL.initialize(is);
            dictionary = Dictionary.getInstance();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        System.out.println(isCorrect("change"));    //  true
        System.out.println(isCorrect("changes"));   //  false
        System.out.println(isCorrect("changed"));   //  true
        System.out.println(isCorrect("changing"));  //  true
        System.out.println();
        System.out.println(isCorrect("analyze"));   //  true
        System.out.println(isCorrect("analyzed"));  //  true
        System.out.println(isCorrect("analyzing")); //  false
    }

    public static boolean isCorrect(String token) {
        try {
            token = token.trim().toLowerCase();
            IndexWordSet set = dictionary.lookupAllIndexWords(token);
            if(set == null)
                return false;

            @SuppressWarnings("unchecked")
            Collection<IndexWord> collection = set.getIndexWordCollection();
            if(collection == null || collection.isEmpty())
                return false;

            for(IndexWord word : collection) {
                Synset[] senses = word.getSenses();
                if(senses != null && senses.length > 0
                        && senses[0].toString().toLowerCase().contains(token)) {
                    return true;
                }
            }

            return false;
        } catch (JWNLException e) {
            e.printStackTrace();
            return false;
        }
    }
}

It is quite fine in most of the cases but you can see getting failed with plural and some ing forms. Can I avoid plural and ing forms anyhow without spoiling English language rules?

If you see, in the WordNet Browser, changes is a valid word, but in Java APIs, it is not valid.

enter image description here

Don't know where I need to correct! Or any other good approach to overcome this issue?

1

There are 1 answers

1
Kris On BEST ANSWER

The mistake you do here is in this loop

for(IndexWord word : collection) {
                Synset[] senses = word.getSenses();
                if(senses != null && senses.length > 0
                        && senses[0].toString().toLowerCase().contains(token)) {
                    return true;
                }
            }

The line Synset[] senses = word.getSenses() returns all senses of the word, but you are checking only the first one (0-index). The word will be available in one of the senses. Something like this

for (IndexWord word : collection) {

            Synset[] senses = word.getSenses();
            for(Synset sense:senses){
                if(sense.getGloss().toLowerCase().contains(token)){return true;}
            }

        }

Adding on to this, the ing forms of words may not be available as senses. I'm not sure why you want to search for the senses to decide its a valid word.

A code like if(set.getLemma() != null) return true;

should be enough to decide the spell check right?