I have a Pandas dataframe where one column contains a non-nested json object in each row.

                             js
0  {"k1":"1","k2":"A","k3":"X"}
1  {"k1":"2","k2":"B","k3":"X"}
2  {"k1":"3","k2":"A","k3":"Y"}
3  {"k1":"4","k2":"D","k4":"M"}

Created like this:

import pandas as pd
L0 = ['{"k1":"1","k2":"A","k3":"X"}',
      '{"k1":"2","k2":"B","k3":"X"}',
      '{"k1":"3","k2":"A","k3":"Y"}',
      '{"k1":"4","k2":"D","k4":"M"}']
df = pd.DataFrame({'js':L0})

I want to make the json-objects into their own dataframe:

  k1 k2   k3   k4
0  1  A    X  NaN
1  2  B    X  NaN
2  3  A    Y  NaN
3  4  D  NaN    M

Right now the only way I know is by using the json module and df.iterrows():

import json
all_json = []
for _,row in df.iterrows():
    all_json.append(json.loads(row["js"]))
df2 = pd.DataFrame.from_dict(all_json)

Is there a better way to do this, ideally without iterating?

EDIT 1:

Thanks for the answers.

I have timed the three suggested approaches using ast.literal_eval on my real world data where my own approach takes 158 ms ± 4.01 ms:

df = df.apply(lambda x: ast.literal_eval(x[0]), 1).apply(pd.Series) takes 640 ms ± 7.8 ms

df['js'].apply(ast.literal_eval).apply(pd.Series) takes 636 ms ± 19 ms

pd.DataFrame(df.js.apply(ast.literal_eval).tolist()) takes 180 ms ± 5.11

As suggested the third approach is the fastest, but sadly they are all slower than the iterrows-approach while my intention was to get rid of iterrows to make it faster.

EDIT 2: pd.DataFrame(df["js"].apply(json.loads).tolist()) takes 25.2 ms ± 512 µs so we have a winner I guess.

3 Answers

2
anky_91 On Best Solutions

I would call the dataframe constructor after converting the string to dict ( i think this would be faster):

import ast
pd.DataFrame(df.js.apply(ast.literal_eval).tolist())

Or:

import json
pd.DataFrame(df["js"].apply(json.loads).tolist())

  k1 k2   k3   k4
0  1  A    X  NaN
1  2  B    X  NaN
2  3  A    Y  NaN
3  4  D  NaN    M
3
Sandeep Kadapa On

Use ast.literal_eval and apply pd.Series as:

import ast
df = df.apply(lambda x: ast.literal_eval(x[0]), 1).apply(pd.Series)

print(df)
  k1 k2   k3   k4
0  1  A    X  NaN
1  2  B    X  NaN
2  3  A    Y  NaN
3  4  D  NaN    M

OR:

df = pd.DataFrame([ast.literal_eval(i) for i in df['js']])

OR:

import json
df = pd.DataFrame([json.loads(i) for i in df['js']])
1
U9-Forward On

You can use apply(pd.Series):

import ast
print(df['js'].apply(ast.literal_eval).apply(pd.Series))

Output:

  k1 k2   k3   k4
0  1  A    X  NaN
1  2  B    X  NaN
2  3  A    Y  NaN
3  4  D  NaN    M