I get some data samples, e.g,
df = pd.DataFrame({'A':[1,1,3,3,4],
'B':['Very Happy','Sad','Sad','Happy','Happy'],
'C': [True,False,False,True,False]})
>> df
A B C
0 1 Very Happy True
1 1 Sad False
2 3 Sad False
3 3 Happy True
4 4 Happy False
And want to compute counts for each combination, so crosstab
is the way to go
counts = pd.crosstab(index = df['A'], columns = [df['B'],df['C']])
>> counts
B Happy Sad Very Happy
C False True False True
A
1 0 0 1 1
3 0 1 1 0
4 1 0 0 0
However, the neither 'Very Sad' emotion nor id 2 happened to show up in this data sample, so it is not in the crosstab. I would like to have it as
Very Happy Happy Sad Very Sad
True False True False True False True False
1 1 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 1 0 0
4 0 0 0 1 0 0 0 0
My workaround is to set up a template
emotions = ['Very Happy', 'Happy', 'Sad', 'Very Sad']
ids = [1,2,3,4]
truths = [True,False]
template = pd.DataFrame(index = pd.Index(ids),
columns= pd.MultiIndex.from_product((emotions,truths)))
>> template
Very Happy Happy Sad Very Sad
True False True False True False True False
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN
And then fill it
template.unstack()[counts.unstack().index] = counts.unstack()
template = template.fillna(0)
>> template
Very Happy Happy Sad Very Sad
True False True False True False True False
1 1 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 1 0 0
4 0 0 0 1 0 0 0 0
The problem is that it feels like there must be a cleaner, more readable way to achieve the same result. Any ideas?
That is a
pivot_table
:The columns/rows which do not appear there are becuase their cross section does not exist in the frame. You may add them by doing a
.reindex
: