porter-stemmer: Stemming in python is not working

909 views Asked by At
  train_data = ["Consultant, changing,  Waiting"]

I'm trying to apply the stemmer to the data with the following code, but It keeps the original data:

 stemmer = stem.porter.PorterStemmer()
    
     train_data = train_stemmer
        
    for i in range(len(train_stemmer)):
        train_stemmer[i] = stemmer.stem(train_stemmer[i])

The code runs fine but does not produce my expected result, which is:

["Consult, change, Wait"]
1

There are 1 answers

0
Alexander L. Hayes On BEST ANSWER

Two things jump out:

  1. train_data in your question is a list containing one string ["Consult, change, Wait"], rather than a list of three strings ["Consult", "change", "Wait"]
  2. Stemming converts to lowercase automatically

If you intended for the list to contain one string, this should work fine:

from nltk.stem import porter

stemmer = porter.PorterStemmer()

# List of one string
string_in_list = ["Consult, change, Wait"]
for word in string_in_list:
    print(stemmer.stem(word))
print("----")

If you wanted a list of three strings, then modify to include quotes between commas:

# List of three strings
individual_words = ["Consult", "change", "Wait"]
for word in individual_words:
    print(stemmer.stem(word))
print("----")

Handling the upper vs. lowercase at the start of the word requires passing a parameter, but can make sense if you're trying to handle proper nouns (e.g. distinguish stemmed change from the name Chang).

# Stem but do not convert first character to lowercase
for word in individual_words:
    print(stemmer.stem(word, to_lowercase=False))

Expected output when all three run:

consult, change, wait
----
consult
chang
wait
----
Consult
chang
Wait