I have the following python pandas dataframe:
Question_ID | Customer_ID | Answer
1 234 The team worked very hard ...
2 234 All the teams have been working together ...
I am going to use my code to count words in the answer column. But beforehand, I want to take out the "s" from the word "teams", so that in the example above I count team: 2 instead of team:1 and teams:1.
How can I do this for all words?
You need to use a tokenizer (for breaking a sentence into words) and lemmmatizer (for standardizing word forms), both provided by the natural language toolkit
nltk
: