Linked Questions

Popular Questions

Pyarrow keeps converting string to binary using Pandas

Asked by At

I am trying to convert a csv file to parquet using pandas and pyarrow in python2.7.

I am having an issue with converting string to string from the pa.Table.from_pandas(df) conversion. It keeps converting the data type to 'binary' and this makes AWS Glue very unhappy.

I have attempted a customized schema it will not work.

fields = []
for name, type in dtypes.items():
        fields.append(pa.field(name, type))
my_schema = pa.schema(fields)
df = pd.read_csv(StringIO(file), delimiter="\t")
table = pa.Table.from_pandas(df)

Previously was specifying the datatype when reading in the csv, that did not work either. Also tried replace_schema_metadata() but that doesn't do much as it isn't the actual schema.

Related Questions