Let's say we have a dataframe df representing the activities of some people as follow:
| index | Mary | Tristan | Louise | Arnaud | Justin | Stacy |
|---|---|---|---|---|---|---|
| 0 | Engineer | Software Engineer | Rock Singer | Rap Singer | Lumberjack | Biomedical Engineer |
| 1 | Guitarist | Aerospace Engineer | Author | Figherfighter | ||
| 2 | Business Man |
And I would like to check if each activity is or might be software engineering. With s = 'Software Engineer', we would obtain:
| index | Mary | Tristan | Louise | Arnaud | Justin | Stacy |
|---|---|---|---|---|---|---|
| 0 | True | True | False | False | False | False |
| 1 | False | False | False | False | False | False |
| 2 | False | False | False | False | False | False |
Which mean that I want to test for all cells in df that they are or are not a substring of s. What already works is the following, but it looks dirty:
s = 'Software Engineer'
df.apply(lambda col: col.apply(lambda x: str(x) in s))
What bothers me is the double apply, there might be a better solution right?
To check every cell in your dataframe if it is a substring of
sno need to numpy, you can useapplymap:Note:
bool(cell)is used to exclude empty and NaN cells and mark them as False.Also if you want the other way around, ie. check if
sis a substring of each cell, you can use vectorized string functions to further optimize your code: