Unable to convert dataframe to parquet, TypeError

19.8k views Asked by At

I was trying to convert a data frame to a parquet file. But I faced the following error.

result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow\array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array
  File "pyarrow\error.pxi", line 107, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: ('Expected a string or bytes dtype, got float64', 'Conversion failed for column NOTES with type float64')

The column type is varchar, so this it converts to str. But there are a few numeric values in the records of that column, and I am doubtful that the data frame parses them as float. Thus, while converting to parquet, it returns a float value that produces an error.

Is there a way to convert the values of these records to str format.

I tried using astype(str) but didn't work.

1

There are 1 answers

2
boirslav popov On

Yes, parquet expects a single type per column. To fix a case like above (i.e. mixed value types), convert it to Pandas 'string' like this:

df['NOTES'] = df['NOTES'].astype('string') 
# & then ... df.to_parquet(...)