How to drop duplicated values in one column for each id in Data Frame in Python Pandas?

Question

How to drop duplicated values in one column for each id in Data Frame in Python Pandas?

56 views Asked by dingaro At 25 October 2023 at 21:19

I have Data Frame in Python Pandas like below:

data = {'id': [1, 1, 1, 1, 2, 2, 3, 3],
        'nps': [8, 8, 8, 8, 7, 7, 9, 9],
        'target': [True, True, True, True, False, False, True, True],
        'score': [0.56, 0.78, 0.56, 0.78,  0.6785, 0.42, 0.9, 0.63],
        'day': ['2023-02-15', '2023-02-15', '2023-02-22', '2023-02-22', '2023-06-10', '2023-06-10', '2023-07-01', '2023-07-01']}
df = pd.DataFrame(data)

And as you can see I have duplicates for each id in column score. I need to have only one score per id.

So, as a result I need something like for example below:

id | nps | target  | score  | day
---|-----|---------|--------|-----------
1  | 8   | True    | 0.56   | 2023-02-15
1  | 8   | True    | 0.56   | 2023-02-22
2  | 7   | False   | 0.42   | 2023-06-10
3  | 9   | True    | 0.90   | 2023-07-01

How can I do that in Python Pandas ?

Original Q&A

There are 1 answers

**CDubyuh** · Accepted Answer · 2023-10-25T21:43:58+00:00

Do you mean one score per id, per day? Because in your example you have id 1 repeating, but separate days.

If that's the case, you can do something like this:

df.drop_duplicates(subset=['id', 'day'], keep='first', inplace=True)

If you need to drop all duplicates, regardless of their date, then just remove the 'day' subset.

df.drop_duplicates(subset=['id'], keep='first', inplace=True)

These snippets will keep the 'first' occurrence of each row/id combination, and drop the rest.

TechQA.

How to drop duplicated values in one column for each id in Data Frame in Python Pandas?

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DUPLICATES

Related Questions in DROP-DUPLICATES

Popular Questions

Popular Tags

Trending Questions