I have a dataframe of json strings I want to convert to json objects. df.col.apply(json.loads) works fine for pandas, but fails when using modin dataframes.

example:

import pandas
import modin.pandas
import json

pandas.DataFrame.from_dict({'a': ['{}']}).a.apply(json.loads)

0    {}
Name: a, dtype: object


modin.pandas.DataFrame.from_dict({'a': ['{}']}).a.apply(json.loads)

TypeError: the JSON object must be str, bytes or bytearray, not float

1 Answers

1
Devin On Best Solutions

This issue was also raised on GitHub, and was answered here: https://github.com/modin-project/modin/issues/616

The error is coming from the error checking component of the run, where we call the apply (or agg) on an empty DataFrame to determine the return type and let pandas handle the error checking (Link).

Locally, I can reproduce this issue and have fixed it by changing the line to perform the operation on one line of the Series. This may affect the performance, so I need to do some more tuning to see if there is a way to speed it up and still be robust. After the fix the overhead of that check is ~10ms for 256 columns and I don't think we want error checking to take that long.

untill the fix is released, it's possible to workaround this issue by using code that work also for empty data - for example:

def safe_loads(x)
  try:
    return json.loads(x)
  except:
    return None

modin.pandas.DataFrame.from_dict({'a': ['{}']}).a.apply(safe_loads)