Convert dictionary of dictionaries to dataframe with data types

1.9k views Asked by At

What is the preferred way to convert dictionary of dictionaries into a data frame with data types?

I have the following kind of dictionary r which contains fact sets behind each key

import pandas as pd

r = { 1:{'a':1,'b':2,'c':'b'},
      2:{'d':1,'b':1,'c':'b'},
      3:{'e':0} }

Converting this dictionary of dictionaries into a dataframe can be done in a quite straightforward way

x = pd.DataFrame(r)
x
x.dtypes

which yields the following version on the original dictionary of dictionaries

     1    2    3
a    1  NaN  NaN
b    2    1  NaN
c    b    e  NaN
d  NaN    1  NaN
e  NaN  NaN  0.0

and the following datatypes for columns

1     object
2     object
3    float64
dtype: object

However, I would like to have transposed version on x. After doing so

y = x.transpose()
y
y.dtypes

it seems like the expected representation on the data is shown in matrix form

     a    b    c    d    e
1    1    2    b  NaN  NaN
2  NaN    1    e    1  NaN
3  NaN  NaN  NaN  NaN    0

but the data types are all object

a    object
b    object
c    object
d    object
e    object
dtype: object

What is the preferred way to do such conversion from r to y so that y.dtypes would yield directly data types

a    float64
b    float64
c    object
d    float64
e    float64
dtype: object

similar to converting r to x?

2

There are 2 answers

0
rafaelc On BEST ANSWER

Just set the right orientation (default is columns, you want index).

df = pd.DataFrame.from_dict(r, orient='index')

a    float64
b    float64
c     object
d    float64
e    float64
dtype: object
3
Tom On

In pandas >= 1.0.0 you can use .convert_dtypes():

>>> y.convert_dtypes().dtypes

a     Int64
b     Int64
c    string
d     Int64
e     Int64
dtype: object

Note that this uses the new pandas string type, and will also use pd.NA for missing values. There are parameters which affect some of the conversion:

>>> y.convert_dtypes(convert_string=False).dtypes

a     Int64
b     Int64
c    object
d     Int64
e     Int64
dtype: object

If you have older pandas, you could use pd.to_numeric with some sort of loop or apply, as here:

>>> y = y.apply(pd.to_numeric, errors='ignore') # for columns that fail, do nothing
>>> y.dtypes

a    float64
b    float64
c     object
d    float64
e    float64
dtype: object

I don't see a way to enforce numeric types on the whole dataframe without a loop (.astype() doesn't seem to work, as errors either cause the whole conversion to fail or if ignored, return the original data types).


I just saw that the documentation for .transpose() addresses this point:

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

Transposing a mixed-type DatraFrame returns an object-type DataFrame. Here's their example reproduced for completeness:

d2 = {'name': ['Alice', 'Bob'],
      'score': [9.5, 8],
      'employed': [False, True],
      'kids': [0, 0]}
df2 = pd.DataFrame(data=d2)
df2_transposed = df2.transpose()

print(df2, df2.dtypes, df2_transposed, df2_transposed.dtypes, sep='\n\n')

Output:

    name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0

#dtypes as expected
name         object
score       float64
employed       bool
kids          int64
dtype: object

              0     1
name      Alice   Bob
score       9.5     8
employed  False  True
kids          0     0

#dtypes are now object
0    object
1    object
dtype: object

So you have to include additional commands if you want the dtypes to be converted.