I have a dataframe with different datatypes like bool, int, float, datetime, category. currently I am converting
# Earlier to pandas 2.0
1. object -> string
2. object -> datetime[ns] # if date
With new pandas 2.0 or above, I am trying to use pyarrow datatypes for all fields and saving in parquet
format.
We can have below:
int8 -> int8[pyarrow] likewise for other int's type
float16 -> float16[pyarrow] likewise for other float's type
string or object -> string[pyarrow]
eg:
df['col_int'] = df['col_int'].astype('int8[pyarrow]')
I did not find much on how to convert datetime and category using astype()
for below:
1. datetime -> timestamp # if date
2. category -> dictionary
eg:
df['col_date'] = df['col_date'].astype(???)
df['col_dictionary'] = df['col_dictionary'].astype(???)
Please help.