Pandas: How to create a new column in a Dataframe and add values in it considering other existing columns

772 views Asked by At

I have a data frame representing some restaurants and their names.

  • What i want to do is to add a column is_chain to my initial Dataframe df that represents if the restaurant is a food chain or not. This new column Takes 0 or 1. The value 1 indicates that the restaurant is part of a chain (eg McDonald's).A restaurant is considered to be part of a chain, if there is another restaurant in the database with the same name.
data = {
        'restaurant_id':  ['1', '2','3','4','5','6','7','8','9','10','11','12'],
        'restaurant_name':  ['Dennys', 'Dennys','Pho U','Pho U','Dennys','Japanese Cafe','Japanese Cafe','Midori','Midori','xxx','yyy','zzz'],
        }

df = pd.DataFrame (data, columns = ['restaurant_id','restaurant_name'])

df.head(15)

So for example here, xxx, yyy and zzz are not part of a chain.

I'm not sure about the correct syntax using pandas to achieve something like this. If any clarifications needed, please ask.

Thank you.

1

There are 1 answers

2
Quang Hoang On BEST ANSWER

This sounds like duplicated:

 df['is_chain'] = df['restaurant_name'].duplicated(keep=False).astype(int)

Output:

   restaurant_id restaurant_name  is_chain
0              1          Dennys         1
1              2          Dennys         1
2              3           Pho U         1
3              4           Pho U         1
4              5          Dennys         1
5              6   Japanese Cafe         1
6              7   Japanese Cafe         1
7              8          Midori         1
8              9          Midori         1
9             10             xxx         0
10            11             yyy         0
11            12             zzz         0