Porter stemming of 'ion-er' words

45 views Asked by At

I'm coding in R and lemmatizing some text about air-conditioning. I use the Snowball stemming library SnowballC::wordStem(), which implements the Porter Stemming Algorithm. I was surprised by this result:

library(SnowballC)
library(tidyverse)
c("conditioner", "conditioning", "condition") %>% SnowballC::wordStem()
[1] "condition" "condit"    "condit" 

Usually, the results of Porter-stemming can't be further stemmed. But here, conditioner > condition while condition still goes to condit. Is this a bug? or am I misunderstanding the Porter Stemmer?

The pattern holds for other '-ioner' words like conventioner, petitioner, revisioner, but not for 'evaluate' -

c("evaluate", "evaluation", "evaluating", "evaluator") %>% SnowballC::wordStem()
[1] "evalu" "evalu" "evalu" "evalu"

The pattern isn't unique to the R SnowballC library; I found online stemmers for Python NLTK and one in C, and they show the same pattern.

So - Is this a bug? And is there a straightforward workaround?

0

There are 0 answers