Efficient way to to get evenly-spaced data / pandas DataFrame.reindex

601 views Asked by At

In order to be able to compare different data sets I need a way to put these on a common time basis. What is the most efficient way to achieve this?

I've tried a few ways and the most easy should - to my understanding - be with pandas DataFrame.reindex:

I have an unevenly spaced time array with associated values for the new status (on/off) which persists after the entry. As such I want to use the previous value of the status column until a new value at a new time for the status is set.

The typical array looks like, df is a one-column DataFrame with time as index and status as column:

In [58]: df
Out[58]: 
           status
time             
1632160022      0
1632986376   <NA>
1632986496      0
1633448715      1
1633452437      0
1633454358      1
1633461201      0
1633534763      1
1633551686      0 
...

From the docs of pandas DataFrame.reindex I read that rebasing / re-indexing with the fill-method pad / ffill should yield the previous value:

# creating evenly-spaced time base for observation duration
tmin = min(df.index)
tmax = max(df.index)
tspacing = 120
tbase = [t for t in range(tmin,tmax,tspacing)]

# create the temporally evenly-spaced DataFrame
ndf = df.reindex(index=tbase, method='pad', tolerance=120)

However the result is different to what I expect, all subsequent status entries get assigned NaN instead of the forward interpolated value:

In[62]: ndf
Out[62]: 
           status
time             
1632160022      0
1632160142      0
1632160262    NaN
1632160382    NaN
1632160502    NaN
          ...

Any idea what I'm missing, doing wrong or if this method does not do the trick: is there another ready-made method available?

1

There are 1 answers

1
Corralien On

As such I want to use the previous value of the status column until a new value at a new time for the status is set.

IIUC:

ndf = df.reindex(tbase, method='ffill')