I have a data frame representing the customers ratings of restaurants. star_rating is rating of the customer in this data frame.
- What i want to do is to add a column
nb_favorables_mentionin the same data frame that represents The total number of reviews that received at least one "useful" or "funny" or "cool" rating AND the rating of the review is> = 3.
data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
'user_id': ['56', '13','56','99','99','13','12','88','45'],
'restaurant_id': ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
'funny': ['1', '0','0','1','1','1','0','0','0'],
'useful': ['1', '0','0','0','1','0','0','0','1'],
'cool': ['1', '0','0','0','1','1','1','1','1'],
}
df = pd.DataFrame (data, columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year','funny','useful','cool'])
df['star_rating'] = df['star_rating'].astype(float)
filtered_data = df[(df['star_rating'] >= 3) & (df['funny']==1 | df['useful']==1 | df['cool']==1)]
d = filtered_data.groupby('restaurant_id')['star_rating'].count().to_dict()
df['nb_favorables_mention'] = df['restaurant_id'].map(d)
df.head(20)
I'm not sure what is wrong with my syntax but from what i tried, i keep getting these error messages
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]
What is the correct syntax considering what i'm trying to achieve?
You have an operator precedence issue; In python,
|operator has higher precedence than==, wrapping comparison expressions in parenthesis should solve your problem, also sincefunny,usefulandcoolcolumns are str type, use string'1'instead of number1:Check result here
Besides using
|, you can also compare multiple columns in one go and then check condition withany: