from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
lemmatizer.lemmatize('Less'.lower())
'le'
What's going on here, and how can I avoid this?
The word 'le' is now appearing all over my LDA topic model, and it doesn't make sense.
Who knows what other words it is affecting in the model. Should I avoid using the Lemmatizer or is there a way to fix this?
I will give more context in addition to the observation in comments. They key is to understand lemmatiziation rules. They depend on the part of speech. Your word is considered a noun (default) and gets its supposed plural suffix stripped twice. Similarly as with the noun
messor its misspelingmes.In your case, the right option is (as in the comments)
More: the rules are
For the whole algorithm, see the source code of
WordNetLemmatizer.lemmatize.