I want to convert a numeric column which is resembling a timedelta in seconds to a ps.TimedeltaIndex
(for the purpose of later resampling the dataset)
import pyspark.pandas as ps
df = ps.DataFrame({"time": [2.0, 3.0, 4.0], "x": [4.5, 4.0, 3.5]})
df.set_index(ps.to_timedelta(df.time, "s").to_numpy())
KeyError: '2000000000 nanoseconds'
I don't understand why this doesn't work.
The answer of @koedlt brought me on the right track, but is still missing the conversion to
TimedeltaIndex
However I also realised that
resample
I mentioned requires actually aDatetimeIndex
, so I should have asked for that. We'd need to useps.to_datetime
(df.time, unit="s")
instead ofps.to_timedelta
in this case