Preserving datetime type when converting from CSV to HDF5 with vaex

513 views Asked by At

I have a csv file with a time column storing timestamps. After converting this file to hdf5 format using the vaex.from_csv() method, the values from the time column are strings. For example:

df = vaex.open("data.csv.hdf5")
time = df["time"].values[0]
print(time)
print(type(time))

the output is:

2020-09-30 01:02:03
<class 'str'>

I've tried formatting the timestamp to ISO 8601, and storing with/without quotes. Results are the same. Is there some way to force vaex to recognize the timestamp as datetime (or np.datetime64) when converting from csv to hdf5?

1

There are 1 answers

1
Joco On BEST ANSWER

I think the problem is that the data type was string when you converted the data from csv to hdf5. My tests show no problems with saving and opening an hdf5 with datetime and timedelta data types.

Looking at the filename, you probably used something like

df = vaex.read_csv(path_to_csv, convert=True)

In this case, vaex (or pandas, since read_csv is just a wrapper around pd.read_csv with some extra options) does not know if a column should be string or something datetime, so by default it chooses string, and this is then propagated.

Using something like

df = vaex.read_csv(path_to_csv, parse_dates=['my_date_column'], convert=True)

should do the trick.

If my assumption is wrong about this, just make sure all the dtypes are as you want them before you export to HDF5.