Why WordNet and JWI stemmer gives "ord" and "orde" in result of "order" stemming?

Question

Why WordNet and JWI stemmer gives "ord" and "orde" in result of "order" stemming?

73 views Asked by Metalman At 06 October 2017 at 11:33

I'm working on a project using WordNet and JWI 2.4.0. Currently, I'm putting a lot of words within the included stemmer, it seems to work, until I asked for "order". The stemmer answers me that "order", "orde", and "ord", are the possible stems of "order". I'm not a native english speaker, but... I never saw the word "ord" in my life... and when I asked the WordNet dictionary for this definition : obviously there is nothing. (in BabelNet online, I found that it is a Nebraska's town !)

Well, why is there this strange stem ? How can I filter the stems that are not present in the WordNet dictionary ? (because when I re-use the stemmed words, "orde" is making the program crash)

Thank you !

ANSWER : I didn't understood well what was a stem. So, this question has no sense.

Here is some code to test :

package JWIExplorer;

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.util.Arrays;
import java.util.Date;
import java.util.Iterator;
import java.util.List;

import edu.mit.jwi.Dictionary;
import edu.mit.jwi.IDictionary;
import edu.mit.jwi.morph.WordnetStemmer;

public class TestJWI
{

    public static void main(String[] args) throws IOException
    {
        List<String> WordList_Research = Arrays.asList("dog", "cat", "mouse");
        List<String> WordList_Research2 = Arrays.asList("order");

        String path = "./" + File.separator + "dict";
        URL url;

        url = new URL("file", null, path);

        System.out.println("BEGIN : " + new Date());

        for (Iterator<String> iterstr = WordList_Research2.iterator(); iterstr.hasNext();)
        {
            String str = iterstr.next();

            TestStem(url, str);
        }

        System.out.println("END : " + new Date());
    }

    public static void TestStem(URL url, String ResearchedWord) throws IOException
    {
        // construct the dictionary object and open it
        IDictionary dict = new Dictionary(url);
        dict.open();

        // First, let's check for the stem word
        WordnetStemmer Stemmer = new WordnetStemmer(dict);
        List<String> StemmedWords;

        // null for all words, POS.NOUN for nouns
        StemmedWords = Stemmer.findStems(ResearchedWord, null);
        if (StemmedWords.isEmpty())
            return;

        for (Iterator<String> iterstr = StemmedWords.iterator(); iterstr.hasNext();)
        {
            String str = iterstr.next();

            System.out.println("Local stemmed iteration on : " + str);
        }
    }

}

Original Q&A

There are 1 answers

**MSalters** · Accepted Answer · 2017-10-06T12:40:01+00:00

Stems do not necessarily need to be words by themselves. "Order" and "Ordinal" share the stem "Ord".

The fundamental problem here is that stems are related to spelling, but language evolution and spelling are only weakly related (especially in English). As a programmer, we'd much rather describe a stem as a regex, e.g. ^ord[ie]. This captures that it's not the stem of "ordained"

TechQA.

Why WordNet and JWI stemmer gives "ord" and "orde" in result of "order" stemming?

There are 1 answers

Related Questions in WORDNET

Related Questions in JWI

Popular Questions

Popular Tags

Trending Questions