pandas count true values in multi-index frame

3.5k views Asked by At

I have a multi-index data frame with levels 'id' and 'year':

id    year
1     2001    True
      2002    True
2     2001    False
      2002    True
3     2001    True
      2002    True

I want to check, for each id level, if the boolean is Truefor consecutive year and then count the number of times this happens. In the above case since for id 1 and id 3 this condition is satisfied then the count would be 2.

I tried to generate a dummy like this (in order to use np.count_nonzero(01_02) afterwards) but this is obviously wrong:

01_02 = (df[df.index.get_level_values(1) == 2001]) & (df[
    df.index.get_level_values(1) == 2002])
1

There are 1 answers

0
unutbu On BEST ANSWER

Notice that if you unstack the id index level of df then you get:

In [35]: df.unstack(['id'])
Out[35]: 
       val             
id       1      2     3
year                   
2001  True  False  True
2002  True   True  True

And we can think of the values above as a boolean array, arr:

arr = df.unstack(['id']).values
# array([[ True, False,  True],
#        [ True,  True,  True]], dtype=bool)

Imagine taking all the rows of the array except the last one:

In [44]: arr[:-1]
Out[44]: array([[ True, False,  True]], dtype=bool)

and comparing it to all the rows of the array except the first one:

In [45]: arr[1:]
Out[45]: array([[ True,  True,  True]], dtype=bool)

We want to count in how many locations they are equal and also equal to True:

In [41]: ((arr[:-1] == arr[1:]) & (arr[:-1] == True)).sum()
Out[41]: 2