Is there any potential downside to using the following code to create a new data frame, wherein I'm specifying very specific information from the original data frame I want to see in the new data frame.

df_workloc = (df[df['WorkLoc'] == 'Home'][df['CareerSat'] == 'Very satisfied'][df['CurrencySymbol'] == 'USD'][df['CompTotal'] >= 50000])

I used the 2019 Stack Overflow survey data. As such:

WorkLoc specifies where a respondent works.

CareerSat specifies a respondent's career satisfaction.

CurrencySymbol specifies what currency a respondent gets paid in.

CompTotal specifies what a respondent's total compensation is.

If anyone has a cleaner, more efficient way of achieving a data frame with refined / specific information I'd love to see it. One thing I'd like to do is specify a Compensation total CompTotal of >= 50000 and <=75000 in the same line. However, I get an error when I tried to include the second boolean.

Thanks in advance.

1

There are 1 answers

0
jezrael On BEST ANSWER

I think you need chain conditions with & for bitwise AND and filter by boolean indexing, also for last condition use Series.between:

m1 = df['WorkLoc'] == 'Home'
m2 = df['CareerSat'] == 'Very satisfied'
m3 = df['CurrencySymbol'] == 'USD'
m4 = df['CompTotal'].between(50000, 75000)
df_workloc = df[m1 & m2 & m3 & m4]

Or for one line solution:

df_workloc = df[(df['WorkLoc'] == 'Home') &
                (df['CareerSat'] == 'Very satisfied') &
                (df['CurrencySymbol'] == 'USD') &
                 df['CompTotal'].between(50000, 75000)]