Appending HDFStore fails, cannot match existing table structure

1.8k views Asked by At

Running into problems when trying to send a dataframe to hdf5 in small chunks via pd.HDFStore('mystore.h5', mode='a').append(my_frame, chunk). The chunks are all the same in terms of columns and types (they come from the same dataframe) But It works for a lot of chunks then bombs half way through.

ValueError: cannot match existing table structure for [Net_Bal_Amt,Loan_Current_Rate] on appending data

I print out the dataframe chunks that caused this fail, the one thing they have in common is all 'None' values for a specific column (they are originally null from the source). Not sure how to correct this. They should stay None or NaN or null, as long as they are empty. Thanks.

Traceback (most recent call last):
  File "[...]\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3381, in create_axes
    b, b_items = by_items.pop(items)
KeyError: ('Net_Bal_Amt', 'Loan_Current_Rate')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]\crd_test.py", line 8, in <module>
    credit.CRD.hdf_install(overwrite=True, tablenames=['loans_uscrd', 'loans_uscrd_a'])
  File "[...]\credit_base.py", line 62, in hdf_install
    cls._hdf_creation(map_)
  File "[...]\credit_base.py", line 80, in _hdf_creation
    cls._hdf_processing(v, chunk)
  File "[...]\credit_base.py", line 88, in _hdf_processing
    cls.crd.append(frame, chunk)   
  File "[...]\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 903, in append
    **kwargs)
  File "[...]\lib\site-packages\pandas\io\pytables.py", line 1259, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "[...]\lib\site-packages\pandas\io\pytables.py", line 3751, in write
    **kwargs)
  File "[...]\Anaconda3\lib\site-packages\pandas\io\pytables.py", line 3388, in create_axes
    item in items))
ValueError: cannot match existing table structure for [Net_Bal_Amt,Loan_Current_Rate] on appending data

dtypes:

pd.read_hdf(r'[...]\crd_test.h5','loans').dtypes
Out[4]: 
Customer_Id                  object
As_of_Date           datetime64[ns]
Net_Bal_Amt                 float64
Loan_Current_Rate           float64
dtype: object

versions: pytables:3.1.1 pandas: 0.15.2 python:3.4

dtypes of chunk being appended on crash:

Customer_Id                  object
As_of_Date           datetime64[ns]
Net_Bal_Amt                 float64
Loan_Current_Rate            object
dtype: object
0

There are 0 answers