Pandas DataFrame constructor sorts rows, even with OrderedDict as input

Question

Pandas DataFrame constructor sorts rows, even with OrderedDict as input

116 views Asked by Jim At 27 September 2020 at 19:06

I create an OrderedDict:

from collections import OrderedDict

od = OrderedDict([((2, 9), 0.5218),
  ((2, 0), 0.3647),
  ((3, 15), 0.3640),
  ((3, 8), 0.3323),
  ((2, 28), 0.3310),
  ((2, 15), 0.3281),
  ((2, 10), 0.2938),
  ((3, 9), 0.2719)])

Then I feed that into the pandas DataFrame constructor:

import pandas as pd

df = pd.DataFrame({'values': od})

the result is this:

instead it should give this:

What is going on here that I don't understand?

P.S.: I am not looking for an alternative way to solving the problem (though you are welcome to post it if you think it would help the community). All I want is to understand why this here doesn't work. Is it a bug, or is there some logic to it? This is also not a duplicate of this link, because i am using specifically an OrderedDict and not a normal dict.

Original Q&A

There are 1 answers

**RichieV** · Answer 1 · 2020-09-27T19:33:23+00:00

If you want to get the DataFrame in the same order as your dictionary you can

df = pd.DataFrame(od.values(), index=od.keys(), columns=['values'])

Output

      values
2 9   0.5218
  0   0.3647
3 15  0.3640
  8   0.3323
2 28  0.3310
  15  0.3281
  10  0.2938
3 9   0.2719

The only mention of OrderedDict in the frame source code is for an example of df.to_dict(), so not useful here.

It seems that even though you are passing an ordered structure, it is being parsed and re-ordered by default once you wrap it in a common dictionary {'values': od} and pandas takes its index from the OrderedDict.

This behavior seems to be overruled if you build your dictionary with the column labels as well (à la json).

od = OrderedDict([
    ((2, 9), {'values':0.5218}),
    ((2, 0), {'values':0.3647}),
    ((3, 15), {'values':0.3640}),
    ((3, 8), {'values':0.3323}),
    ((2, 28), {'values':0.3310}),
    ((2, 15), {'values':0.3281}),
    ((2, 10), {'values':0.2938}),
    ((3, 9), {'values':0.2719})
])
df = pd.DataFrame(od).T
print(df)
      values
2 9   0.5218
  0   0.3647
3 15  0.3640
  8   0.3323
2 28  0.3310
  15  0.3281
  10  0.2938
3 9   0.2719

TechQA.

Pandas DataFrame constructor sorts rows, even with OrderedDict as input

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in ORDEREDDICT

Popular Questions

Popular Tags

Trending Questions