Why WordNet and JWI stemmer gives "ord" and "orde" in result of "order" stemming?

67 views Asked by At

I'm working on a project using WordNet and JWI 2.4.0. Currently, I'm putting a lot of words within the included stemmer, it seems to work, until I asked for "order". The stemmer answers me that "order", "orde", and "ord", are the possible stems of "order". I'm not a native english speaker, but... I never saw the word "ord" in my life... and when I asked the WordNet dictionary for this definition : obviously there is nothing. (in BabelNet online, I found that it is a Nebraska's town !)

Well, why is there this strange stem ? How can I filter the stems that are not present in the WordNet dictionary ? (because when I re-use the stemmed words, "orde" is making the program crash)

Thank you !

ANSWER : I didn't understood well what was a stem. So, this question has no sense.

Here is some code to test :

package JWIExplorer;

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.util.Arrays;
import java.util.Date;
import java.util.Iterator;
import java.util.List;

import edu.mit.jwi.Dictionary;
import edu.mit.jwi.IDictionary;
import edu.mit.jwi.morph.WordnetStemmer;

public class TestJWI
{

    public static void main(String[] args) throws IOException
    {
        List<String> WordList_Research = Arrays.asList("dog", "cat", "mouse");
        List<String> WordList_Research2 = Arrays.asList("order");

        String path = "./" + File.separator + "dict";
        URL url;

        url = new URL("file", null, path);

        System.out.println("BEGIN : " + new Date());

        for (Iterator<String> iterstr = WordList_Research2.iterator(); iterstr.hasNext();)
        {
            String str = iterstr.next();

            TestStem(url, str);
        }

        System.out.println("END : " + new Date());
    }

    public static void TestStem(URL url, String ResearchedWord) throws IOException
    {
        // construct the dictionary object and open it
        IDictionary dict = new Dictionary(url);
        dict.open();

        // First, let's check for the stem word
        WordnetStemmer Stemmer = new WordnetStemmer(dict);
        List<String> StemmedWords;

        // null for all words, POS.NOUN for nouns
        StemmedWords = Stemmer.findStems(ResearchedWord, null);
        if (StemmedWords.isEmpty())
            return;

        for (Iterator<String> iterstr = StemmedWords.iterator(); iterstr.hasNext();)
        {
            String str = iterstr.next();

            System.out.println("Local stemmed iteration on : " + str);
        }
    }

}
1

There are 1 answers

5
MSalters On BEST ANSWER

Stems do not necessarily need to be words by themselves. "Order" and "Ordinal" share the stem "Ord".

The fundamental problem here is that stems are related to spelling, but language evolution and spelling are only weakly related (especially in English). As a programmer, we'd much rather describe a stem as a regex, e.g. ^ord[ie]. This captures that it's not the stem of "ordained"