KeyError when filter pandas dataframe by column with particular key:value pair

134 views Asked by At

My df looks like the following

col1 col_x
... {"key_x":"value1"}
... None
... {"key_x":"value2"}

How to select all items with {"key_x":"value1"} in col_x ?

col_x values could be dict or None.

What I've tried:

df.loc[df['col_x']['key_x'] == 'value1']                           # KeyError: 'key_x'
df[~df['col_x'].isnull()].loc[df['col_x']['key_x'] == 'value1']    # KeyError: 'key_x'
df_ok = df[[isinstance(x, dict) and ('key_x' in x.keys()) for x in df.col_x]]
df_ok.loc[df_ok['col_x']['key_x'] == 'value1']                     # KeyError: 'key_x'

(last one syntax according this answer)

2

There are 2 answers

0
Shubham Sharma On

Lets us use str.get to yank the value from dictionary. The str.get method can handle null values as well as missing keys.

df[df['col_x'].str.get('key_x').eq('value1')]

                 col_x
0  {'key_x': 'value1'}
0
G. Anderson On

One option is to apply dict.get on your dataframe column

df_ok.loc[df_ok['col_x'].apply(lambda x: x.get('key_x')) == 'value1'] 

    col1    col_x
0   ...     {'key_x': 'value1'}

As an aside, the syntax you used in your attempt that generated a keyerror was looking at the named indices of the dataframe, not the index of the values in the dataframe. For example, if I edit your df to give it string indices instead of numeric:

df.index=['key_x','key_none','key_y']

Then you can use that construction to find a row or rows where that index exists, but since that string doesn;t appear in the DF index, you get the keyerror

df.loc[df['col_x']['key_x']]

        col1    col_x
key_x   ...     {'key_x': 'value1'}