Fill out missing time series intervals

340 views Asked by At

I have several time intervals recorded as the following:

  In         Out          tag      
  2008-12-18 2008-12-19   1
  2008-12-22 2008-12-23   1
  2008-12-29 2009-01-02   1
  2009-01-05 2009-01-05   1
  2009-01-13 2009-01-13   1
  2009-01-14 2009-01-14   1
  2009-01-19 2009-01-19   1

I would like to fill inn the missing intervals so it looks like this:

  In         Out          tag      
  2008-12-18 2008-12-19   1
  2008-12-20 2008-12-21   0
  2008-12-22 2008-12-23   1
  2008-12-24 2008-12-28   0
  2008-12-29 2009-01-02   1
  2009-01-03 2008-01-04   0
  2009-01-05 2009-01-05   1
  ...

I know that I can use zoo to fill out missing dates of a time series. Also that I could create intervals with interval(start, end) from the lubridate package. My initial though was that I somehow could combine this to fill out the missing intervals.

I've also been thinking about whether there are any "brut-force" methods that I could use to fill the intervals. e.g., function that would get the last item of Out in the previous row and In from the next row, but have not managed to find any solution.

Ideally, I would like to know if there are any clever ways to do this using zoo, lubridate, xts or other tools in R?

1

There are 1 answers

1
sirallen On BEST ANSWER

Try this:

library(data.table)

df = data.table(
  In=as.Date('2008-12-18') + cumsum(c(0,4,7,7,8,1,5)),
  Out=as.Date('2008-12-19') + cumsum(c(0,4,10,3,8,1,5)),
  tag=1)

toMerge = df[, .(In=Out+1, Out=shift(In-1, type='lead'), tag=0)][In <= Out]

> merge(df, toMerge, all=T)
#            In        Out tag
# 1: 2008-12-18 2008-12-19   1
# 2: 2008-12-20 2008-12-21   0
# 3: 2008-12-22 2008-12-23   1
# 4: 2008-12-24 2008-12-28   0
# 5: 2008-12-29 2009-01-02   1
# 6: 2009-01-03 2009-01-04   0
# ...