What is a MultiIndex in Pandas and why is Pandas saying that I don't have a MultiIndex?

645 views Asked by At

I'm trying to group my multiIndex as

df.groupby(level=1, 'Amount($m)').sum() 

But I'm getting the error that level > 0 is only valid with a MultiIndex. When I call df.info() I get

class 'pandas.core.frame.DataFrame'

but when I print df.columns I get

MultiIndex(levels=[[u'MSS', u'MLLN', u'AMS'],
      [u'Trades', u'Resolved Trades'],
      [u'Amount($m)', u'Trades', u'Resolved Trades']],
    labels=[[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1], [1, 0, 2, 0, 1, 0, 2, 0, 1, 0, 2, 0]],
    names=[u'Entity', None, None])

Can someone explain what I'm missing?

1

There are 1 answers

1
Ted Petrou On

Pandas dataframes have two index objects:

  • a column index for the columns (above the data) and
  • a row index for the rows (to the left of the data).

The default for the groupby method is to group by the values in a column or columns (levels of the column index). You however attempted to group by the values in one of the levels of the row index. Your row index has only 1 level (level 0) and this is why you received an error.

It looks like you are wanting to group in the other direction by the column names themselves in the bottom level of the column MultiIndex object. Try changing the axis parameter to 1 and choose level 2. This might not be what you are looking for. Supplying more data would help.

df.groupby(level=2, axis=1).sum()

Here is a contrived example similar to what you see in the the documentation.

# create multiindex and random data
iterables = [[u'MSS', u'MLLN', u'AMS'], 
             [u'Trades', u'Resolved Trades'], 
             [u'Amount($m)', u'Trades', u'Resolved Trades']]


idx = pd.MultiIndex.from_product(iterables)

df = pd.DataFrame(np.random.randn(3, len(idx)), index=['A', 'B', 'C'], columns=idx)
print(df)

         MSS                                                      \
      Trades                           Resolved Trades             
  Amount($m)    Trades Resolved Trades      Amount($m)    Trades   
A   0.776702  1.731951        0.474472        0.371514  0.023242   
B  -1.907887  0.347118        0.911265       -0.044541 -0.245793   
C  -1.126279 -1.917745        0.000187        0.368609 -0.742214   

                        MLLN                                            \
                      Trades                           Resolved Trades   
  Resolved Trades Amount($m)    Trades Resolved Trades      Amount($m)   
A        0.287962  -0.812814  0.410907       -0.263261       -0.059534   
B        0.088943   0.260369 -0.065651        0.494519       -0.358993   
C       -0.765357  -0.645231  0.409186       -0.240529        1.440578   

                                   AMS                            \
                                Trades                             
     Trades Resolved Trades Amount($m)    Trades Resolved Trades   
A -0.719607        0.948184  -0.063318 -0.715459       -1.252230   
B -0.047890        0.573863  -1.961252  1.661206        1.385927   
C -1.340498       -0.394141  -0.815319 -0.150870       -0.482750   


  Resolved Trades                            
       Amount($m)    Trades Resolved Trades  
A       -0.310624 -1.201535        0.467134  
B       -0.151055  0.606990       -0.485391  
C        0.130125  1.262464       -2.287201 

Then run the first statement I wrote:

df.groupby(level=2, axis=1).sum()

Output

   Amount($m)  Resolved Trades    Trades
A   -0.098074         0.662263 -0.470501
B   -4.163359         2.969126  2.255981
C   -0.647517        -4.169791 -2.479675

Finally, you can select just Amount($m)