python iterate / loop over two columns and drop entire row after value is first found in either column a or column b

111 views Asked by At

I have a dataframe with 15 columns being used to calculate a score. Two columns (a & b) are my independent variables of which a & b both have duplicate values. Column C represents the score being calculated- of which i have sorted the dataframe by column C descending already. The goal is to keep the highest scored combination of a & b columns and drop any columns after.

Column A Column B Column C
5 10 1.5
5 12 1.4
10 12 1.0
7 14 0.9
7 9 0.8
12 6 0.7
14 4 0.6

In the above example, I would want the second column, third column, fifth column, sixth, and seventh columns all dropped. Sixth and seventh columns would be dropped because 12 and 14 were already included in rows above in columns b.

1

There are 1 answers

4
Dani Mesejo On

Use Series.duplicated

res = df[~(df["Column A"].duplicated() | df["Column B"].duplicated())]
print(res)

Output

   Column A  Column B  Column C
0         5        10       1.5
3         7        14       0.9