I am trying to allow the definition of pandas DataFrame objects in a YAML file, I believe this should be possible because DataFrame objects are pickleable.
My stripped down YAML file is as follows, saved as 'config.yaml':
!!python/object/new:pandas.DataFrame [[{'dimension1_id':58,'metric1':10},{'dimension1_id':50,'metric':10}]]
And I am using the following to load the data into my python script
f = open('config.yaml')
y = yaml.load(f)
print y
The output (reduced) is as follows:
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2085, in __getattr__
if name in self.columns:
File "properties.pyx", line 55, in pandas.lib.AxisProperty.__get__ (pandas\lib.c:29240)
RuntimeError: maximum recursion depth exceeded while calling a Python object
I'm using the PyYAML documentation as my only source of information on this.
Can anyone guess why pandas is getting into an infinite loop?
EDIT: Seems like DataFrames objects are not serializable by default, and the extra leg-work looks like more trouble than it is worth. Here is the YAML file that gets created by yaml_serializer from just a simple DataFrame object:
!!python/object/new:pandas.core.frame.DataFrame
state: !!python/object/new:pandas.core.internals.BlockManager
state:
- - !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- &id001 !!python/name:pandas.core.index.Index ''
- [0]
- b
state:
- - 1
- [!!python/long '2']
- &id002 !dtype 'object'
- false
- [dfsd, id]
- [null]
- !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- !!python/name:pandas.core.index.Int64Index ''
- [0]
- b
state:
- - 1
- [!!python/long '2']
- !dtype 'int64'
- false
- "\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0"
- [null]
- - - [!!python/long '23', !!python/long '123']
- [!!python/long '7', !!python/long '123']
- - !!python/object/apply:numpy.core.multiarray._reconstruct
args:
- *id001
- [0]
- b
state:
- - 1
- [!!python/long '2']
- *id002
- false
- [dfsd, id]
- [null]
I don't think DataFrames are pickleable "out of the box"...
to_pickle
is doing some pandas-specific wrangling that other modules would miss. Others around here know more about this.But I have had some success saving Series to yaml with this little module. Doing it with DataFrames should be possible also, since they can be treated as dicts of Series.