Lex-sorting a pandas series/dataframe

2.3k views Asked by At

Hi I have a data frame which has the following values as input

{1,20,21,10,100,1000,30}

I need the output to be sorted as {1,10,100,1000,20,21,30}

What is the best way to do this, I have more than 20000 values to be sorted in the above format.

I tried sort_values but its not helping .

Thanks

1

There are 1 answers

3
this be Shiva On BEST ANSWER

Given a series of the form:

s

0       1
1      20
2      21
3      10
4     100
5    1000
6      30
dtype: int64

To lex-sort this series, you'd need to first cast to string and then call sort_values.

s.astype(str).sort_values()

0       1
3      10
4     100
5    1000
1      20
2      21
6      30
dtype: object

In the case of a dataframe, you'd have to do something similar:

df['Col'] = df['Col'].astype(str)
df.sort_values(subset=['Col'])
df['Col'] = pd.to_numeric(df['Col'])

As an improvement to this, I'd recommend argsort for the task, followed by a little indexing magic with iloc.

idx = df['Col'].astype(str).argsort()
df = df.iloc[idx]

With this approach, you don't have to persist the cast before a sort_values call.