n highest values in dataframe

Asked by At

I have a pandas data frame like:

        column0     column1     column2     column3     column4
row0    179319.0    180895.0    94962.0     130734.0    0
row1    89659.5     90447.5     47481.0     65367.0     0
row2    59773.0     60298.33333 31654.0     43578.0     0
row3    44829.75    45223.75    23740.5     32683.5     0
row4    35863.8     36179.0     18992.4     26146.8     0
row5    29886.5     30149.16666 15827.0     21789.0     0
row6    25617.0     25842.14285 13566.0     18676.28571 0
row7    22414.875   22611.875   11870.25    16341.75    0
row8    19924.33333 20099.44444 10551.33333 14526.0     0

and I would like to get something like the index of the 9 (number of rows) highest values, or something like the count of the highest values for each column like:

column0  column1  column2  column3  column4
3        3        1        2        0

In my example the 9 highest values would be the ones from column0, column1, column2, and column3 from row0, the ones from column0, column1, and column3 from row1, and the ones from column0 and column1 from row2.

Any ideas? Thanks!

4 Answers

7
Wen-Ben On Best Solutions

IIUC nlargest after stack

df.stack().nlargest(9).groupby(level=1).count().reindex(df.columns,fill_value=0)
Out[48]: 
column0    3
column1    3
column2    1
column3    2
column4    0
dtype: int64
0
GiraffeMan91 On

pandas has a function called nlargest that will return the nlargest values of any column as a series. [docs]

If you want just the index of each, then you would use index.values [usage] [docs] or you could use .index.array as recommended in the docs.

If you wanted a count of all of the highest values, check out this answer here.

3
Vaishali On

IIUC, You can stack and sort,

df.stack().sort_values(ascending = False).iloc[:9]

row0  column1    180895.00000
      column0    179319.00000
      column3    130734.00000
      column2     94962.00000
row1  column1     90447.50000
      column0     89659.50000
      column3     65367.00000
row2  column1     60298.33333
      column0     59773.00000
0
ALollz On

Here's a solution using np + collections.Counter, which should be pretty fast to get the Counter object.

from collections import Counter
import numpy as np
import pandas as pd

c = Counter(np.tile(df.columns, df.shape[0])[np.argsort(df.to_numpy().ravel())[-9:]])
#Counter({'column0': 3, 'column1': 3, 'column2': 1, 'column3': 2})

s = pd.Series(c).reindex(df.columns).fillna(0).astype(int)
#column0    3
#column1    3
#column2    1
#column3    2
#column4    0
#dtype: int32