I am trying to wrap my mind around some unexpected behaviour of pandas dataframe query
method:
Assuming a test dataframe:
>>> df = pd.DataFrame([[1,1,1,2,2,2],[1,2,3,4,5,6]], columns=['a', 'b', 'c', 'd', 'e', 'f'])
>>> df
a b c d e f
0 1 1 1 2 2 2
1 1 2 3 4 5 6
One can select the first row with the following query expression:
>>> df.query('a == b == c == 1 & d == e == f == 2')
a b c d e f
0 1 1 1 2 2 2
My aim however, is to select all rows except those satisfying above expression. Intuitively that should work by simply wrapping the entire expression in parenthesis and pre-pending a logical not
. - Right?
>> df.query('~(a == b == c == 1 & d == e == f == 2)')
a b c d e f
0 1 1 1 2 2 2
1 1 2 3 4 5 6
Clearly that is not the expected result. If one however draws the not
into the expression with a little algebra, the whole thing does work:
>>> df.query('~(a == b == c == 1) | ~(d == e == f == 2)')
a b c d e f
1 1 2 3 4 5 6
Can anybody explain to me what is going on here? Clearly the last two query strings are logically identical but they still return different results.