Using PyYAML to create pandas DataFrame objects - Recursion depth exceeded

2.2k views Asked by At

I am trying to allow the definition of pandas DataFrame objects in a YAML file, I believe this should be possible because DataFrame objects are pickleable.

My stripped down YAML file is as follows, saved as 'config.yaml':

!!python/object/new:pandas.DataFrame [[{'dimension1_id':58,'metric1':10},{'dimension1_id':50,'metric':10}]]

And I am using the following to load the data into my python script

f = open('config.yaml')
y = yaml.load(f)
print y

The output (reduced) is as follows:

File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2085, in __getattr__
if name in self.columns:
File "properties.pyx", line 55, in pandas.lib.AxisProperty.__get__ (pandas\lib.c:29240)
RuntimeError: maximum recursion depth exceeded while calling a Python object

I'm using the PyYAML documentation as my only source of information on this.

Can anyone guess why pandas is getting into an infinite loop?

EDIT: Seems like DataFrames objects are not serializable by default, and the extra leg-work looks like more trouble than it is worth. Here is the YAML file that gets created by yaml_serializer from just a simple DataFrame object:

!!python/object/new:pandas.core.frame.DataFrame
state: !!python/object/new:pandas.core.internals.BlockManager
  state:
  - - !!python/object/apply:numpy.core.multiarray._reconstruct
      args:
      - &id001 !!python/name:pandas.core.index.Index ''
      - [0]
      - b
      state:
      - - 1
        - [!!python/long '2']
        - &id002 !dtype 'object'
        - false
        - [dfsd, id]
      - [null]
    - !!python/object/apply:numpy.core.multiarray._reconstruct
      args:
      - !!python/name:pandas.core.index.Int64Index ''
      - [0]
      - b
      state:
      - - 1
        - [!!python/long '2']
        - !dtype 'int64'
        - false
        - "\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0"
      - [null]
  - - - [!!python/long '23', !!python/long '123']
      - [!!python/long '7', !!python/long '123']
  - - !!python/object/apply:numpy.core.multiarray._reconstruct
      args:
      - *id001
      - [0]
      - b
      state:
      - - 1
        - [!!python/long '2']
        - *id002
        - false
        - [dfsd, id]
      - [null]
1

There are 1 answers

0
Dan Allan On BEST ANSWER

I don't think DataFrames are pickleable "out of the box"...to_pickle is doing some pandas-specific wrangling that other modules would miss. Others around here know more about this.

But I have had some success saving Series to yaml with this little module. Doing it with DataFrames should be possible also, since they can be treated as dicts of Series.