Getting the adjectives and plurals of lemma's in various languages

111 views Asked by At

Could anyone point me to a solution/lib to instead of lemmatise, to do inflection(?). And for multiple languages (English, Dutch, German and French).

Or to give an example. I have the lemma 'science' for which I need the words 'sciences', 'scientific', 'scientifically'... returned. So plural and adjectives.

I looked into NLTK (cf Wordnet and Spacy), but did not find a solution.

1

There are 1 answers

0
Stef On

You can invert a lemmatise function by applying it to every word in the Scrabble dictionary, and grouping words with a common stem in a python dict.

Of course the groups will strongly depend on the lemmatise function you have. Below, I use nltk.stem.WordNetLemmatizer.lemmatize, which correctly groups 'science' and 'sciences' under the same stem 'science', but doesn't group 'scientific' with them.

So you'll need a more "brutal" lemmatise function that brings more words to the same stem.

import nltk
from nltk.stem import WordNetLemmatizer

wnl = WordNetLemmatizer()
d = {}
with open('scrabble_dict.txt', 'r') as f:
    next(f); next(f) # skip header
    for word in f:
        word = word.strip().lower()
        d.setdefault(wnl.lemmatize(word), []).append(word)

print(d['science'])
# ['science', 'sciences']

print(d['scientific'])
# ['scientific']

print([stem for stem in d if stem.startswith('scien')])
# ['science', 'scienced', 'scient', 'scienter', 'sciential', 'scientific', 'scientifical', 'scientifically', 'scientificities', 'scientificity', 'scientise', 'scientised', 'scientises', 'scientising', 'scientism', 'scientisms', 'scientist', 'scientistic', 'scientize', 'scientized', 'scientizes', 'scientizing']

print(d['lemma'])
# ['lemma', 'lemmas', 'lemmata']