python pandas get ride of plural "s" in words to prepare for word count

Question

python pandas get ride of plural "s" in words to prepare for word count

1.6k views Asked by jeangelj At 19 December 2016 at 16:50

I have the following python pandas dataframe:

Question_ID | Customer_ID | Answer
    1           234         The team worked very hard ...
    2           234         All the teams have been working together ...

I am going to use my code to count words in the answer column. But beforehand, I want to take out the "s" from the word "teams", so that in the example above I count team: 2 instead of team:1 and teams:1.

How can I do this for all words?

Original Q&A

There are 3 answers

piRSquared On 19 December 2016 at 17:00

use str.replace to remove the s from any 3 or more letter word that ends in 's'.

df.Answer.str.replace(r'(\w{2,})s\b', r'\1')

0                  The team worked very hard ...
1    All the team have been working together ...
Name: Answer, dtype: object

'{2,}' specifies 2 or more. That combined with the 's' ensures that you'll miss 'is'. You can set it to '{3,}' to ensure you skip 'its' as well.

Little Bobby Tables On 19 December 2016 at 17:13

Try the NTLK toolkit. Specifically Stemming and Lemmatization. I have never personally used it but here you can try it out.

Here is an example of some tricky plurals,

its it's his quizzes fishes maths mathematics

becomes

it it ' s hi quizz fish math mathemat

You can see it deals with "his" (and "mathematics") poorly, but then again you could have lots of abbreviated "hellos". This is the nature of the beast.

**DYZ** · Accepted Answer · 2016-12-19T17:11:05+00:00

You need to use a tokenizer (for breaking a sentence into words) and lemmmatizer (for standardizing word forms), both provided by the natural language toolkit nltk:

import nltk
wnl = nltk.WordNetLemmatizer()
[wnl.lemmatize(word) for word in nltk.wordpunct_tokenize(sentence)]
# ['All', 'the', 'team', 'have', 'been', 'working', 'together']

TechQA.

python pandas get ride of plural "s" in words to prepare for word count

There are 3 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in WORD-COUNT

Popular Questions

Popular Tags

Trending Questions