Initializing Pandas DF Columns if any Substrings in Another Column

Question

Initializing Pandas DF Columns if any Substrings in Another Column

36 views Asked by zishaf At 17 August 2023 at 16:30

My dataframe has a summary column with plain text. I also have a dictionary matching new column names as keys to lists of keywords as values. I'd like to add all those columns to my dataframe with each row initialized as 1 if any of their associated keywords is contained in my summary or -99 if no keywords are present.

Here's my code trying to accomplish this:

# headers is a list of strings, keywords is a list of lists.  Each column has a list of keywords
KEYWORDS_DICT = dict(zip(headers, keywords))

for column in KEYWORDS_DICT:
    df[column] = np.where(any(df['summary'].str.contains(keyword) for keyword in KEYWORDS_DICT[column]), 1, -99)

It's currently giving me 'ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' Is there a good way to resolve this or another way to accomplish my goal?

Thanks!

Original Q&A

There are 2 answers

Suraj Shourie On 17 August 2023 at 16:39

You have to add a .any after your str.contains, see code below:

# temp data
df = pd.DataFrame({'summary': ["abc", "qwe", "xyz"]})
KEYWORDS_DICT = {'col1': ["abc", "xyz"], "col2": ["nm"]}

# note the added .any()
for column in KEYWORDS_DICT:
    df[column] = np.where(any(df['summary'].str.contains(keyword).any() for keyword in KEYWORDS_DICT[column]), 1, -99)

Output:

{'summary': {0: 'abc', 1: 'qwe', 2: 'xyz'},
 'col1': {0: 1, 1: 1, 2: 1},
 'col2': {0: -99, 1: -99, 2: -99}}

**zishaf** · Accepted Answer · 2023-08-18T14:28:08+00:00

zishaf On 18 August 2023 at 14:28 BEST ANSWER

The proposed answer gave me all 1s for all columns. I was able to get my desired result by calling '|'.join() on my keyword lists then searching my summary for that string.

TechQA.

Initializing Pandas DF Columns if any Substrings in Another Column

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in ANY

Popular Questions

Trending Questions