Check if pandas row is unique, when order is not considered

Question

Check if pandas row is unique, when order is not considered

2k views Asked by msa At 28 September 2020 at 16:03

I wondered if there is a way to check and then drop certain rows which are not unique?

My data frame looks something like this:

    ID1 ID2 weight  
 0  2   4   0.5
 1  3   7   0.8 
 2  4   2   0.5 
 3  7   3   0.8
 4  8   2   0.5
 5  3   8   0.5

EDIT: I added a couple more rows to show that other unique rows that may have the same weight should be kept.

I think that when I use pandas drop_duplicates(subset=['ID1', 'ID2','weight'], keep=False) it considers each row individually but not recognise that rows 0 and 2 and 1 and 4 are in fact the same values?

Original Q&A

There are 2 answers

Michael Szczesny On 28 September 2020 at 16:14

This works, but it's kind of hacky. Create sets from columns that should be pairs and convert to tuples to get hashable types

df['new'] = df[['ID1','ID2']].apply(lambda x: tuple(set(x)), axis=1)
df.drop_duplicates(subset=['new','weight'], keep=False)

Out:

   ID1  ID2  weight     new
4    8    2     0.5  (8, 2)
5    3    8     0.5  (8, 3)

**Shubham Sharma** · Accepted Answer · 2020-09-28T16:19:30+00:00

Sort the dataframe along axis=1 then use np.unique with optional param return_index=True to get the indices of unique elements:

sub = ['ID1', 'ID2', 'weight']

idx = np.unique(np.sort(df[sub], 1), axis=0, return_index=True)[1]
df1 = df.iloc[sorted(idx)]

Alternative approach suggested by @anky:

df1 = df[~pd.DataFrame(np.sort(df[sub], 1), index=df.index).duplicated()]

print(df1)

   ID1  ID2  weight
0    2    4     0.5
1    3    7     0.8
4    8    2     0.5
5    3    8     0.5

TechQA.

Check if pandas row is unique, when order is not considered

There are 2 answers

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in DROP-DUPLICATES

Popular Questions

Popular Tags

Trending Questions