I have a data frame representing the customers ratings of restaurants. star_rating
is rating of the customer in this data frame.
- What i want to do is to add a column
nb_favorables_mention
in the same data frame that represents The total number of reviews that received at least one "useful" or "funny" or "cool" rating AND the rating of the review is> = 3.
data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
'user_id': ['56', '13','56','99','99','13','12','88','45'],
'restaurant_id': ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
'funny': ['1', '0','0','1','1','1','0','0','0'],
'useful': ['1', '0','0','0','1','0','0','0','1'],
'cool': ['1', '0','0','0','1','1','1','1','1'],
}
df = pd.DataFrame (data, columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year','funny','useful','cool'])
df['star_rating'] = df['star_rating'].astype(float)
filtered_data = df[(df['star_rating'] >= 3) & (df['funny']==1 | df['useful']==1 | df['cool']==1)]
d = filtered_data.groupby('restaurant_id')['star_rating'].count().to_dict()
df['nb_favorables_mention'] = df['restaurant_id'].map(d)
df.head(20)
I'm not sure what is wrong with my syntax but from what i tried, i keep getting these error messages
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
TypeError: Cannot perform 'ror_' with a dtyped [object] array and scalar of type [bool]
What is the correct syntax considering what i'm trying to achieve?
You have an operator precedence issue; In python,
|
operator has higher precedence than==
, wrapping comparison expressions in parenthesis should solve your problem, also sincefunny
,useful
andcool
columns are str type, use string'1'
instead of number1
:Check result here
Besides using
|
, you can also compare multiple columns in one go and then check condition withany
: