Pandas: convert different datatypes to pyarrow datatypes using astype()

385 views Asked by At

I have a dataframe with different datatypes like bool, int, float, datetime, category. currently I am converting

# Earlier to pandas 2.0

1. object -> string
2. object -> datetime[ns] # if date

With new pandas 2.0 or above, I am trying to use pyarrow datatypes for all fields and saving in parquet format.

We can have below:

int8 -> int8[pyarrow] likewise for other int's type
float16 -> float16[pyarrow] likewise for other float's type
string or object -> string[pyarrow]

eg:

df['col_int'] = df['col_int'].astype('int8[pyarrow]')

I did not find much on how to convert datetime and category using astype() for below:

1. datetime -> timestamp # if date
2. category -> dictionary

eg:

df['col_date'] = df['col_date'].astype(???)
df['col_dictionary'] = df['col_dictionary'].astype(???)

Please help.

1

There are 1 answers

0
Ivan Kocherov On BEST ANSWER

Try this:

import pyarrow as pa

df['col_date'] = df['col_date'].astype(pd.ArrowDtype(pa.string()))

More apache arrow data types