I want to replace duplicates with NaN based on the following condition:
The ID & Code are duplicates. If the Code is different, keep it.
For example:

ID  Code
1   A10
1   A10
1   A10
1   E39
1   I24
2   O32
2   K94
3   E39

I tried this:

df.loc[df['ID'].duplicated(), 'Code'] = np.nan

But it just keeps the first code and replaces any other code. I want something that only replaces the Code if the ID & Code matched with other instance.

Desired output:

ID  Code
1   A10
1   NaN
1   NaN
1   E39
1   I24
2   O32
2   K94
3   E39

1 Answers

1
jezrael On Best Solutions

Use DataFrame.duplicated by specifying both the columns:

df.loc[df.duplicated(['ID','Code']), 'Code'] = np.nan
#alternatives
#df['Code'] = df['Code'].mask(df.duplicated(['ID','Code']))
#df['Code'] = np.where(df.duplicated(['ID','Code']), np.nan, df['Code'])
print (df)
   ID  Code
0   1   A10
1   1   NaN
2   1   NaN
3   1   E39
4   1   I24
5   2   O32
6   2   K94
7   3  E830