I have a csv file with a time column storing timestamps. After converting this file to hdf5 format using the vaex.from_csv() method, the values from the time column are strings. For example:
df = vaex.open("data.csv.hdf5")
time = df["time"].values[0]
print(time)
print(type(time))
the output is:
2020-09-30 01:02:03
<class 'str'>
I've tried formatting the timestamp to ISO 8601, and storing with/without quotes. Results are the same. Is there some way to force vaex to recognize the timestamp as datetime (or np.datetime64) when converting from csv to hdf5?
I think the problem is that the data type was string when you converted the data from csv to hdf5. My tests show no problems with saving and opening an hdf5 with datetime and timedelta data types.
Looking at the filename, you probably used something like
In this case, vaex (or pandas, since
read_csv
is just a wrapper aroundpd.read_csv
with some extra options) does not know if a column should be string or something datetime, so by default it chooses string, and this is then propagated.Using something like
should do the trick.
If my assumption is wrong about this, just make sure all the dtypes are as you want them before you export to HDF5.