Pandas: inverting query string produces invalid results

603 views Asked by At

I am trying to wrap my mind around some unexpected behaviour of pandas dataframe query method:

Assuming a test dataframe:

>>> df = pd.DataFrame([[1,1,1,2,2,2],[1,2,3,4,5,6]], columns=['a', 'b', 'c', 'd', 'e', 'f'])
>>> df
   a  b  c  d  e  f
0  1  1  1  2  2  2
1  1  2  3  4  5  6

One can select the first row with the following query expression:

>>> df.query('a == b == c == 1 & d == e == f == 2')
   a  b  c  d  e  f
0  1  1  1  2  2  2

My aim however, is to select all rows except those satisfying above expression. Intuitively that should work by simply wrapping the entire expression in parenthesis and pre-pending a logical not. - Right?

>> df.query('~(a == b == c == 1 & d == e == f == 2)')
   a  b  c  d  e  f
0  1  1  1  2  2  2
1  1  2  3  4  5  6

Clearly that is not the expected result. If one however draws the not into the expression with a little algebra, the whole thing does work:

>>> df.query('~(a == b == c == 1) | ~(d == e == f == 2)')
   a  b  c  d  e  f
1  1  2  3  4  5  6

Can anybody explain to me what is going on here? Clearly the last two query strings are logically identical but they still return different results.

0

There are 0 answers