I need to compare 2 DataFrames and drop rows in either that do not contain the corresponding IDs. As an example consider df1
and df2
.
df1 = pd.DataFrame({'ID':[1,2,3,4],
'Food':['Ham','Cheese','Egg','Bacon',],
'Amount':[5,2,10,4,],
})
df2 = pd.DataFrame({'ID':[1,2,4,5],
'Food':['Ham','Cheese','Bacon','Chocolate Salty Balls'],
'Amount':[6,7,15,5000],
})
Pseudocode:
if df1['ID'] notin df2['ID']:
df2['ID'].drop()
...and vice versa. The outcome in the example would be:
df1
:
ID Food Amount
0 1 Ham 5
1 2 Cheese 2
2 4 Bacon 4
df2
:
ID Food Amount
0 1 Ham 6
1 2 Cheese 7
2 4 Bacon 15
You could use a boolean mask:
or use
set.intersection
and boolean indexing:Output:
df1
:df2
: