Pandas: Filter correctly Dataframe columns considering multiple conditions

416 views Asked by At

I have a data frame representing the customers ratings of restaurants. star_rating is rating of the customer in this data frame.

  • What i want to do is to add a column nb_favorables_mention in the same data frame that represents The total number of reviews that received at least one "useful" or "funny" or "cool" rating AND the rating of the review is> = 3.
data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
        'user_id': ['56', '13','56','99','99','13','12','88','45'],
        'restaurant_id':  ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
        'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
        'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
        'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
        'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
        'funny': ['1', '0','0','1','1','1','0','0','0'],
        'useful': ['1', '0','0','0','1','0','0','0','1'],
        'cool': ['1', '0','0','0','1','1','1','1','1'],

        }


df = pd.DataFrame (data, columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year','funny','useful','cool'])
df['star_rating'] = df['star_rating'].astype(float)



filtered_data = df[(df['star_rating'] >= 3) & (df['funny']==1 | df['useful']==1 | df['cool']==1)]
d = filtered_data.groupby('restaurant_id')['star_rating'].count().to_dict()

df['nb_favorables_mention'] = df['restaurant_id'].map(d)
df.head(20)

I'm not sure what is wrong with my syntax but from what i tried, i keep getting these error messages

  • ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

  • TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]

What is the correct syntax considering what i'm trying to achieve?

1

There are 1 answers

5
Psidom On BEST ANSWER

You have an operator precedence issue; In python, | operator has higher precedence than ==, wrapping comparison expressions in parenthesis should solve your problem, also since funny, useful and cool columns are str type, use string '1' instead of number 1:

filtered_data = df[(df['star_rating'] >= 3) & ((df['funny']=='1') | (df['useful']=='1') | (df['cool']=='1'))]

Check result here

Besides using |, you can also compare multiple columns in one go and then check condition with any:

filtered_data = df[(df['star_rating'] >= 3) & df[['funny', 'useful', 'cool']].eq('1').any(axis=1)]