How to create a new column in a dataframe using a multikey dictionary

69 views Asked by At

I have a multikey dictionary that I would like to use to create a new column in a dataframe. Doing so with a single key dataframe is quite easy but I am stumped as to what the correct syntax is to send two values to the dictionary.

I have been able to use a single key dictionary using map, get, and apply (map example)

    import pandas as pd      
    df = pd.DataFrame(data = {'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})

    single_dict = {1: 'This', 2: 'is', 3: 'pretty', 4: 'easy'}

    df['newcol_a'] = df['Col1'].map(single_dict)

    print(df)```

which returns the expected"

    Col1 Col2 newcol_a
    0     1    A     This
    1     2    B       is
    2     3    C   pretty
    3     4    D     easy

But when I create a multikey dictionary such as

dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS' , 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME'  , 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'},}

I am able to call it using 'get'

dbl_dict.get(1, {}).get('A', 'Other')
Out[5]: 'THIS'      

But I cannot figure out the syntax (tried about 40 different things, such as df['newcol_b'] = df[['Col1', 'Col2']].map(dbl_dict) ) to get the desired results:

    Col1 Col2 newcol_a
    0     1    A     THIS
    1     2    B      HAS
    2     3    C       ME
    3     4    D  STUMPED
2

There are 2 answers

1
DeepSpace On

map does not know how to handle a nested dict. If you insist on using this dict you can use apply on the entire dataframe but you'd have to create a custom mapping function:

import pandas as pd

df = pd.DataFrame(data={'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D']})
dbl_dict = {1: {'A': 'THIS', 'B': 'blah', 'C': 'blah', 'D': 'blah'},
            2: {'A': 'blah', 'B': 'HAS', 'C': 'blah', 'D': 'blah'},
            3: {'A': 'blah', 'B': 'blah', 'C': 'ME', 'D': 'blah'},
            4: {'A': 'blah', 'B': 'blah', 'C': 'blah', 'D': 'STUMPED'}}

df['new_col'] = df.apply(lambda s: dbl_dict.get(s['Col1'], {}).get(s['Col2']), axis=1)

df is now

   Col1 Col2  new_col
0     1    A     THIS
1     2    B      HAS
2     3    C       ME
3     4    D  STUMPED

A solution with loc (or at) might be possible (and if so, will probably be faster). Need to look into that.

0
Robo Bayer On

The easiest option you have, in my opinion, is to create a new DataFrame using your nested dictionary and unstack this DataFrame, which you can then join with your original DataFrame, like so:

s = pd.DataFrame(dbl_dict).unstack().rename_axis(('Col1','Col2')).rename('new_column')
print (s)
df = df.join(s, on=['Col1','Col2'])
print (df)