I'm coding in R and lemmatizing some text about air-conditioning. I use the Snowball stemming library SnowballC::wordStem()
, which implements the Porter Stemming Algorithm. I was surprised by this result:
library(SnowballC)
library(tidyverse)
c("conditioner", "conditioning", "condition") %>% SnowballC::wordStem()
[1] "condition" "condit" "condit"
Usually, the results of Porter-stemming can't be further stemmed. But here, conditioner > condition while condition still goes to condit. Is this a bug? or am I misunderstanding the Porter Stemmer?
The pattern holds for other '-ioner' words like conventioner, petitioner, revisioner, but not for 'evaluate' -
c("evaluate", "evaluation", "evaluating", "evaluator") %>% SnowballC::wordStem()
[1] "evalu" "evalu" "evalu" "evalu"
The pattern isn't unique to the R SnowballC library; I found online stemmers for Python NLTK and one in C, and they show the same pattern.
So - Is this a bug? And is there a straightforward workaround?