Plot overlaps of time intervals

441 views Asked by At

I have the following df

Id   a_min_date      a_max_date      b_min_date     b_max_date       c_min_date       c_max_date           d_min_date     a_max_date
1    2014-01-01      2014-01-10      2014-01-05     2014-01-15            NA               NA              2014-02-20       2014-05-01
2    2014-02-01      2014-02-10       NA              NA               2015-02-20       2015-03-01             NA               NA    

I have added the intervals of each group (a, b, c,d) by ID. First, I have converted the start and end dates to lubridate intervals. I want to plot the intervals and calculate the time difference in days between the end of each group and the start of next group if there is no overlap. I tried to use IRanges package and converted the dates into integers (as used here (link)), but does not work for me.

ir <- IRanges::IRanges(start = as.integer((as.Date(df$a_min_date))), end = as.integer((as.Date(df$a_max_date))))
bins <- disjointBins(IRanges(start(ir), end(ir) + 1))
dat <- cbind(as.data.frame(ir), bin = bins)

ggplot(dat) + 
  geom_rect(aes(xmin = start, xmax = end,
                ymin = bin, ymax = bin + 0.9)) +
  theme_bw()

I got this error for my orginal df:

Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : 
  solving row 1: range cannot be determined from the supplied arguments (too many NAs)

Does someone have another solution using other packages?

1

There are 1 answers

2
scs On

To my knowledge, IRanges is the best package out there to solve this problem. IRanges needs range values (in this case dates) to compare and does not handle undefined values (NAs)

To solve this problem, I would remove all rows with NAs in df before doing the analysis.

df <- df[complete.cases(df[ , 1:2]),]

Explanation and other ways to remove NAs see Remove rows with all or some NAs (missing values) in data.frame.

If this does not fix the problem, you could convert the dates into integers. Important there is that the dates have the year-month-day format to result in correct intervals.

Example:

str <- "2006-06-26"


splitted<- unlist(strsplit(str,"-"))
[1] "2006" "06"   "26"

result <- paste(splitted,collapse="")
[1] "20060626"