Is there any way to prevent my WordNetLemmatizer from lemmatizing contracted words like "can't" or "didn't"?

Question

Is there any way to prevent my WordNetLemmatizer from lemmatizing contracted words like "can't" or "didn't"?

368 views Asked by Koedam12 At 12 October 2020 at 17:18

The code below is what I currently have, which works fine but it changes words like "didn't" into "didn" and "t". I would like for it to either remove the apostrophe so it would come out as "didnt" or just leave it as "didn't" though that may result in issues later with TfidfVectorizer?

Is there any way to implement this without too much of a hassle?

def get_wordnet_pos(word):
    """Map POS tag to first character lemmatize() accepts"""
    tag = pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)

lemmatizer = WordNetLemmatizer()

def lemmatize_review(review):
    """Lemmatize single review string"""
    lemmatized_review = ' '.join([lemmatizer.lemmatize(word, get_wordnet_pos(word)) for word in word_tokenize(review)])
    return lemmatized_review

review_data['Lemmatized_Review'] = review_data['Review'].apply(lemmatize_review)

Original Q&A

There are 2 answers

**qaiser** · Answer 1 · 2020-10-13T05:15:44+00:00

qaiser On 13 October 2020 at 05:15

you can use tweettokenizer instead of word tokenizer

from nltk.tokenize import TweetTokenizer

str = "didn't can't won't how are you"
tokenizer = TweetTokenizer()

tokenizer.tokenize(str)
#op
["didn't", "can't", "won't", 'how', 'are', 'you']

**Kedaar Rao** · Answer 2 · 2020-10-12T17:42:18+00:00

Kedaar Rao On 12 October 2020 at 17:42

You can just replace the "'" character with and empty character "" before proceeding with lemmatization as shown below:

>>> word = "didn't can't won't"
>>> word
"didn't can't won't"
>>> x = word.replace("'", "")
>>> x
'didnt cant wont'

TechQA.

Is there any way to prevent my WordNetLemmatizer from lemmatizing contracted words like "can't" or "didn't"?

There are 2 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in NLTK

Related Questions in LEMMATIZATION

Popular Questions

Popular Tags

Trending Questions