Is there a better way to check for each element in a dataframe that it is contained in a given string?

Question

Is there a better way to check for each element in a dataframe that it is contained in a given string?

46 views Asked by ramech At 05 July 2023 at 08:09

Let's say we have a dataframe df representing the activities of some people as follow:

index	Mary	Tristan	Louise	Arnaud	Justin	Stacy
0	Engineer	Software Engineer	Rock Singer	Rap Singer	Lumberjack	Biomedical Engineer
1	Guitarist	Aerospace Engineer	Author		Figherfighter
2		Business Man

And I would like to check if each activity is or might be software engineering. With s = 'Software Engineer', we would obtain:

index	Mary	Tristan	Louise	Arnaud	Justin	Stacy
0	True	True	False	False	False	False
1	False	False	False	False	False	False
2	False	False	False	False	False	False

Which mean that I want to test for all cells in df that they are or are not a substring of s. What already works is the following, but it looks dirty:

s = 'Software Engineer'
df.apply(lambda col: col.apply(lambda x: str(x) in s))

What bothers me is the double apply, there might be a better solution right?

Original Q&A

There are 2 answers

Debi Prasad On 05 July 2023 at 08:38

One of the methods that you can do is using the properties of numpy arrays and then getting the appropriate solution

# Let's assume df is your dataframe which contains all the information
df=df.fillna('None')
# replace the null values as None
values=df.values
boolean_values=values=='Software Engineer'

Now your boolean_values array will contain the data in the exact format you want, and now you can just reframe the dataframe in the way you want

cols=df.columns
df=pd.DataFrame(boolean_values,columns=cols)

And there you go you have the desired output.

**abdelgha4** · Accepted Answer · 2023-07-05T09:23:03+00:00

To check every cell in your dataframe if it is a substring of s no need to numpy, you can use applymap :

df.applymap(lambda cell: bool(cell) and cell in s)

Note: bool(cell) is used to exclude empty and NaN cells and mark them as False.

Also if you want the other way around, ie. check if s is a substring of each cell, you can use vectorized string functions to further optimize your code:

df.apply(lambda column: column.str.contains(s))

TechQA.

Is there a better way to check for each element in a dataframe that it is contained in a given string?

There are 2 answers

Related Questions in PANDAS

Related Questions in SUBSTRING

Related Questions in APPLY

Related Questions in CONTAINS

Related Questions in ISIN

Popular Questions

Trending Questions