i have two dataframes and i want to compare the values of two columns and display those who are different, for exemple: compare this Table 1
A | B | C | D |
---|---|---|---|
O1 | 2 | E1 | 2 |
O1 | 3 | E1 | 1 |
O1 | 2 | E1 | 0 |
O1 | 5 | E2 | 2 |
O1 | 2 | E2 | 3 |
O1 | 2 | E2 | 2 |
O1 | 5 | E2 | 1 |
O2 | 8 | E1 | 2 |
O2 | 8 | E1 | 0 |
O2 | 0 | E1 | 1 |
O2 | 2 | E1 | 4 |
O2 | 9 | E1 | 2 |
O2 | 2 | E2 | 1 |
O2 | 9 | E2 | 4 |
O2 | 2 | E2 | 2 |
with this table 2
A | B | C | D |
---|---|---|---|
O1 | 2 | E1 | 2 |
O1 | 2 | E2 | 3 |
O2 | 2 | E1 | 4 |
O2 | 9 | E2 | 4 |
i tried
cond= [table1.A == table2.A, table1.C == table2.C, table1.D == table2.D]
join = table1.join(table2,cond,"leftsemi")
and since i have a lot of data in it, i don't know how to check if the result i've got is correct
Since your dataframes has the same schema, you can use
subtract
df1
df2
subtract
to get data that exists indf1
but does not exists indf2