What is the analogue of EXCEPT clause in SQL in Pandas?

Question

What is the analogue of EXCEPT clause in SQL in Pandas?

14.9k views Asked by im7 At 05 January 2017 at 11:24

I have a sample pandas dataframe df:

        col1    col2    col3    col4
      0   a      1.0    2.0      3
      1   b      NaN    NaN      6
      2   c      NaN    8.0      9
      3   d      NaN    11.0    12
      4   e     13.0    14.0    15
      5   f     17.0    18.0    19
      6   g     21.0    22.0    23

and a second one df1:

        col1    col2    col3    col4
      0  a      1.0     2.0      3
      4  e     13.0    14.0     15
      5  f     17.0    18.0     19
      6  g     21.0    22.0     23

I want to get the subset of df that does not overlaps with df1. In effect I am looking for the equivalent of the EXCEPT operand in SQL.

I used the subtract() function -- but this was clearly wrong, as the subtract performs elementwise numerical subtraction. So I got an error message:

       TypeError: unsupported operand type(s) for -: 'str' and 'str'

So the question is: What is the equivalent of EXCEPT in SQL for Pandas?

Original Q&A

There are 2 answers

MaxU - stand with Ukraine On 19 January 2017 at 20:11

I think a Pandas equivalent for SQL EXCEPT (MINUS) would be the following technique:

In [16]: df1
Out[16]:
   a  b  c
0  1  a  5   # duplicates row with index: 3 
1  0  x  4
2  9  Z  9   # exists in DF2, so it should NOT appear in the result set
3  1  a  5   # duplicates row with index: 3

In [17]: df2
Out[17]:
    a  b    c
0  66  a  5.0
1   9  Z  9.0
2   0  x  NaN

In [18]: (pd.merge(df1, df2, on=df1.columns.tolist(), how='outer', indicator=True)
    ...:    .query("_merge == 'left_only'")
    ...:    .drop('_merge', 1)
    ...: )
    ...:
Out[18]:
   a  b    c
0  1  a  5.0
1  1  a  5.0
2  0  x  4.0

NOTE: this solution does NOT pay attention at indices

**jezrael** · Accepted Answer · 2017-01-05T11:27:07+00:00

I think you need set_index of all string columns first:

df2 = df.set_index('col1').subtract(df1.set_index('col1'), axis='columns')
print (df2)
      col2  col3  col4
col1                  
a      0.0   0.0   0.0
b      NaN   NaN   NaN
c      NaN   NaN   NaN
d      NaN   NaN   NaN
e      0.0   0.0   0.0
f      0.0   0.0   0.0
g      0.0   0.0   0.0

Or:

df2 = df.set_index('col1').subtract(df1.set_index('col1'), axis='columns', fill_value=0)
print (df2)
      col2  col3  col4
col1                  
a      0.0   0.0   0.0
b      NaN   NaN   6.0
c      NaN   8.0   9.0
d      NaN  11.0  12.0
e      0.0   0.0   0.0
f      0.0   0.0   0.0
g      0.0   0.0   0.0

EDIT by edited question:

print (df.isin(df1))
    col1   col2   col3   col4
0   True   True   True   True
1  False  False  False  False
2  False  False  False  False
3  False  False  False  False
4   True   True   True   True
5   True   True   True   True
6   True   True   True   True

print (df.isin(df1).all(axis=1))
0     True
1    False
2    False
3    False
4     True
5     True
6     True
dtype: bool

print (~df.isin(df1).all(axis=1))
0    False
1     True
2     True
3     True
4    False
5    False
6    False
dtype: bool

print (df[~(df.isin(df1).all(axis=1))])
  col1  col2  col3  col4
1    b   NaN   NaN     6
2    c   NaN   8.0     9
3    d   NaN  11.0    12

TechQA.

What is the analogue of EXCEPT clause in SQL in Pandas?

There are 2 answers

Related Questions in PYTHON

Related Questions in SQL

Related Questions in PANDAS

Related Questions in EXCEPT

Popular Questions

Popular Tags

Trending Questions