NLTK lemmatizer changing "less" to "le". Text doesn't make sense anymore

Question

NLTK lemmatizer changing "less" to "le". Text doesn't make sense anymore

135 views Asked by SCool At 02 June 2023 at 10:50

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize('Less'.lower())

'le'

What's going on here, and how can I avoid this?

The word 'le' is now appearing all over my LDA topic model, and it doesn't make sense.

Who knows what other words it is affecting in the model. Should I avoid using the Lemmatizer or is there a way to fix this?

Original Q&A

There are 1 answers

**Maciej Skorski** · Answer 1 · 2023-06-03T18:34:33+00:00

I will give more context in addition to the observation in comments. They key is to understand lemmatiziation rules. They depend on the part of speech. Your word is considered a noun (default) and gets its supposed plural suffix stripped twice. Similarly as with the noun mess or its misspeling mes.

from nltk.stem import WordNetLemmatizer
word = 'mes'
wnl = WordNetLemmatizer()
wnl.lemmatize(word) # me

In your case, the right option is (as in the comments)

word = 'less'
wnl = WordNetLemmatizer()
wnl.lemmatize(word, 'a') # less

More: the rules are

from nltk.corpus.reader import WordNetCorpusReader
WordNetCorpusReader.MORPHOLOGICAL_SUBSTITUTIONS

{'n': [('s', ''),
  ('ses', 's'),
  ('ves', 'f'),
  ('xes', 'x'),
  ('zes', 'z'),
  ('ches', 'ch'),
  ('shes', 'sh'),
  ('men', 'man'),
  ('ies', 'y')],
 'v': [('s', ''),
  ('ies', 'y'),
  ('es', 'e'),
  ('es', ''),
  ('ed', 'e'),
  ('ed', ''),
  ('ing', 'e'),
  ('ing', '')],
 'a': [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
 'r': [],
 's': [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')]}

For the whole algorithm, see the source code of WordNetLemmatizer.lemmatize.

TechQA.

NLTK lemmatizer changing "less" to "le". Text doesn't make sense anymore

There are 1 answers

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in GENSIM

Related Questions in TEXT-CLASSIFICATION

Related Questions in WORDNET

Popular Questions

Trending Questions