How to show missing dates in case of application of rolling function

Question

How to show missing dates in case of application of rolling function

115 views Asked by AnilGoyal At 27 September 2020 at 14:14

Suppose I have a data df of some insurance policies.

library(tidyverse)
library(lubridate)

#Example data
d <- as.Date("2020-01-01", format = "%Y-%m-%d")
set.seed(50)
df <- data.frame(id = 1:10, 
                 activation_dt = round(runif(10)*100,0) +d, 
                 expiry_dt = d+round(runif(10)*100,0)+c(rep(180,5), rep(240,5)))

> df
   id activation_dt  expiry_dt
1   1    2020-03-12 2020-08-07
2   2    2020-02-14 2020-07-26
3   3    2020-01-21 2020-09-01
4   4    2020-03-18 2020-07-07
5   5    2020-02-21 2020-07-27
6   6    2020-01-05 2020-11-04
7   7    2020-03-11 2020-11-20
8   8    2020-03-06 2020-10-03
9   9    2020-01-05 2020-09-04
10 10    2020-01-12 2020-09-14

I want to see how many policies were active during each month. That I have done by the following method.

# Getting required result

df %>% arrange(activation_dt) %>% 
  pivot_longer(cols = c(activation_dt, expiry_dt), 
               names_to = "event",
               values_to = "event_date") %>%
  mutate(dummy = ifelse(event == "activation_dt", 1, -1)) %>%
  mutate(dummy2 = floor_date(event_date, "month")) %>%
  arrange(dummy2) %>% group_by(dummy2) %>%
  summarise(dummy=sum(dummy)) %>%
  mutate(dummy = cumsum(dummy)) %>%
  select(dummy2, dummy)

# A tibble: 8 x 2
  dummy2     dummy
  <date>     <dbl>
1 2020-01-01     4
2 2020-02-01     6
3 2020-03-01    10
4 2020-07-01     7
5 2020-08-01     6
6 2020-09-01     3
7 2020-10-01     2
8 2020-11-01     0

Now I am having problem as to how to deal with missing months e.g. April 2020 to June 2020 etc.

Original Q&A

There are 2 answers

Ben On 27 September 2020 at 15:46

Here is an alternative tidyverse/lubridate solution in case you are interested. The data.table version will be faster, but this should give you the correct results with gaps in months.

First use map2 to create a sequence of months between activation and expiration for each row of data. This will allow you to group by month/year to count number of active policies for each month.

library(tidyverse)
library(lubridate)

df %>%
  mutate(month = map2(floor_date(activation_dt, "month"),
                      floor_date(expiry_dt, "month"), 
                      seq.Date, 
                      by = "month")) %>%
  unnest(month) %>%
  transmute(month_year = substr(month, 1, 7)) %>%
  group_by(month_year) %>%
  summarise(count = n())

Output

   month_year count
   <chr>      <int>
 1 2020-01        4
 2 2020-02        6
 3 2020-03       10
 4 2020-04       10
 5 2020-05       10
 6 2020-06       10
 7 2020-07       10
 8 2020-08        7
 9 2020-09        6
10 2020-10        3
11 2020-11        2

**Waldi** · Accepted Answer · 2020-09-27T15:17:33+00:00

A data.table solution :

generate the months sequence
use non equi joins to find policies active every month and count them

library(lubridate)
library(data.table)

setDT(df)
months <- seq(lubridate::floor_date(mindat,'month'),lubridate::floor_date(max(df$expiry_dt),'month'),by='month')
months <- data.table(months)

df[,c("activation_dt_month","expiry_dt_month"):=.(lubridate::floor_date(activation_dt,'month'),
                                                  lubridate::floor_date(expiry_dt,'month'))]

df[months, .(months),on = .(activation_dt_month<=months,expiry_dt_month>=months)][,.(nb=.N),by=months]

       months nb
 1: 2020-01-01  4
 2: 2020-02-01  6
 3: 2020-03-01 10
 4: 2020-04-01 10
 5: 2020-05-01 10
 6: 2020-06-01 10
 7: 2020-07-01 10
 8: 2020-08-01  7
 9: 2020-09-01  6
10: 2020-10-01  3
11: 2020-11-01  2

TechQA.

How to show missing dates in case of application of rolling function

There are 2 answers

Related Questions in R

Related Questions in DPLYR

Related Questions in LUBRIDATE

Related Questions in CUMSUM

Related Questions in ROLLING-COMPUTATION

Popular Questions

Popular Tags

Trending Questions