Finding exact word in description column of DataFrame in Python

473 views Asked by At

My list contains some words like : [‘orange’, ‘cool’, ‘app’....] and I want to output all these exact whole words (if available) from a description column in a DataFrame.

I have also attached a sample picture with code. I used str.findall() The picture shows, it extracts add from additional, app from apple. However, I do not want that. It should only output if it matches the whole word. enter image description here

1

There are 1 answers

4
Wiktor Stribiżew On

You can fix the code using

df['exactmatch'] = df['text'].str.findall(fr"\b({'|'.join(list1)})\b").str.join(", ")

Or, if there can be special chars in your list1 words,

df['exactmatch'] = df['text'].str.findall(fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)").str.join(", ")

The pattern created by fr"\b({'|'.join(list1)})\b" and fr"(?<!\w)({'|'.join(map(re.escape, list1))})(?!\w)" will look like

\b(orange|cool|app)\b
(?<!\w)(orange|cool|app)(?!\w)

See the regex demo. Note .str.join(", ") is considered faster than .apply(", ".join).