I have been facing the following problem. I have a dataframe with multiple index (three here):
df = pd.DataFrame(np.random.randint(2, 8, size = (8, 1)))
df.index = pd.MultiIndex.from_tuples([(1990, 'Women','type_A'), (1990, 'Women','type_B'),(1990, 'Men','type_A'), (1990, 'Men','type_B'),
(1991, 'Women','type_A'), (1991, 'Women','type_B'),(1991, 'Men','type_A'), (1991, 'Men','type_B')])
df.index.names = ['Year', 'Gender','Type']
df.columns = ['Total']
which looks like:
Total
Year Gender Type
1990 Women type_A 5
type_B 7
Men type_A 6
type_B 2
1991 Women type_A 2
type_B 6
Men type_A 3
type_B 5
I have been trying to compute the share of each Type
and Gender
by Year
but I have not found any clear answer on SOF. At the end of the day I need to get the following df:
Share
Year Gender Type
1990 Women type_A 0.4166
type_B 0.5833
Men type_A 0.7500
type_B 0.2500
1991 Women type_A 0.2500
type_B 0.7500
Men type_A 0.3750
type_B 0.6250
Normally, I would do it using div
function but it does not seem to work here with more than one index. Has someone faced a similar situation ? Thanks in advance !
One option would be to calculate the sum group by year and gender and then divide the original data frame by the sum (the result is slightly different because you didn't set seed for the random generator):