In order to be able to compare different data sets I need a way to put these on a common time basis. What is the most efficient way to achieve this?
I've tried a few ways and the most easy should - to my understanding - be with pandas DataFrame.reindex:
I have an unevenly spaced time array with associated values for the new status (on/off) which persists after the entry. As such I want to use the previous value of the status column until a new value at a new time for the status is set.
The typical array looks like, df
is a one-column DataFrame with time as index and status as column:
In [58]: df
Out[58]:
status
time
1632160022 0
1632986376 <NA>
1632986496 0
1633448715 1
1633452437 0
1633454358 1
1633461201 0
1633534763 1
1633551686 0
...
From the docs of pandas DataFrame.reindex I read that rebasing / re-indexing with the fill-method pad / ffill
should yield the previous value:
# creating evenly-spaced time base for observation duration
tmin = min(df.index)
tmax = max(df.index)
tspacing = 120
tbase = [t for t in range(tmin,tmax,tspacing)]
# create the temporally evenly-spaced DataFrame
ndf = df.reindex(index=tbase, method='pad', tolerance=120)
However the result is different to what I expect, all subsequent status
entries get assigned NaN
instead of the forward interpolated value:
In[62]: ndf
Out[62]:
status
time
1632160022 0
1632160142 0
1632160262 NaN
1632160382 NaN
1632160502 NaN
...
Any idea what I'm missing, doing wrong or if this method does not do the trick: is there another ready-made method available?
IIUC: