I have a data frame representing the customers ratings of restaurants. star_rating
is rating of the customer in this data frame.
- What i want to do is to add a column
nb_fave_rating
in the same data frame that represents the total number of favorable reviews for a restaurant. I consider a "favorable" opinion if its number of stars is> = 3
.
data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
'user_id': ['56', '13','56','99','99','13','12','88','45'],
'restaurant_id': ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
}
df = pd.DataFrame (data, columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year'])
df['star_rating'] = df['star_rating'].astype(float)
positive_reviews = df[df.star_rating >= 3.0 ].groupby('restaurant_id')
positive_reviews.head()
From here, i dont know to to count the number of positive reviews for a restaurant and add it to the new column of my initial dataframe df
.
The output expected would be something like this.
data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
'user_id': ['56', '13','56','99','99','13','12','88','45'],
'restaurant_id': ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
'nb_fave_rating': ['1', '1','1','1','1','1','1','0','0'],
}
So i tried this and had a bunch of NaNs
df['nb_fave_rating']=df[df.star_rating >= 3.0 ].groupby('restaurant_id').agg({'star_rating': 'count'})
df.head()
Here is a potential solution with
groupby
andmap
:Output: