I have the following df
Id a_min_date a_max_date b_min_date b_max_date c_min_date c_max_date d_min_date a_max_date
1 2014-01-01 2014-01-10 2014-01-05 2014-01-15 NA NA 2014-02-20 2014-05-01
2 2014-02-01 2014-02-10 NA NA 2015-02-20 2015-03-01 NA NA
I have added the intervals of each group (a, b, c,d) by ID. First, I have converted the start and end dates to lubridate intervals. I want to plot the intervals and calculate the time difference in days between the end of each group and the start of next group if there is no overlap. I tried to use IRanges package and converted the dates into integers (as used here (link)), but does not work for me.
ir <- IRanges::IRanges(start = as.integer((as.Date(df$a_min_date))), end = as.integer((as.Date(df$a_max_date))))
bins <- disjointBins(IRanges(start(ir), end(ir) + 1))
dat <- cbind(as.data.frame(ir), bin = bins)
ggplot(dat) +
geom_rect(aes(xmin = start, xmax = end,
ymin = bin, ymax = bin + 0.9)) +
theme_bw()
I got this error for my orginal df:
Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") :
solving row 1: range cannot be determined from the supplied arguments (too many NAs)
Does someone have another solution using other packages?
To my knowledge, IRanges is the best package out there to solve this problem. IRanges needs range values (in this case dates) to compare and does not handle undefined values (NAs)
To solve this problem, I would remove all rows with NAs in df before doing the analysis.
Explanation and other ways to remove NAs see Remove rows with all or some NAs (missing values) in data.frame.
If this does not fix the problem, you could convert the dates into integers. Important there is that the dates have the year-month-day format to result in correct intervals.
Example: