Calculate duration after excluding non-working hours/weekends in R

130 views Asked by At

To assess how much time is needed to answer a customer's query, I have the following data frame (df1) with columns ID, Task, Date_time, where ID = customer number, Task = Task done by the employee related to the query, and Date_time = POSIXct column to specify when the task was conducted.

For each customer, I want to find the duration in minutes between Task == "New" and Task == "Closed".

To calculate duration in minutes, I have to consider that:

  1. There could be multiple times that Task == "Closed" appears, therefore only the last Task == "Closed" is to be considered in the calculation.

  2. The working hours is from 8 am to 5 pm CET (8:00 to 17:00) Monday to Friday. The duration has to exclude non-working hours (5 pm to 8 am) and weekends (Saturday and Sunday).

Can someone please suggest how to calculate the duration, considering the above points? Thanks!

The data frame looks like this:

           ID             Task           Date_time
1   customer1              New 2022-11-09 15:33:32
2   customer1             Edit 2022-11-09 15:38:40
4   customer1         Answered 2022-11-09 15:44:44
5   customer1 FeedbackRequired 2022-11-11 08:02:51
6   customer1           Closed 2022-11-17 15:04:23
8   customer2              New 2022-04-11 13:55:22
9   customer2             Edit 2022-04-11 13:59:53
11  customer2         Answered 2022-05-11 11:17:15
12  customer2 FeedbackRequired 2022-05-11 11:17:41
13  customer2           Closed 2022-08-17 13:23:29
15  customer2           Closed 2022-08-17 13:24:24
17  customer2           Closed 2022-08-17 13:32:41

Here is an example data frame:

df1 <- structure(list(ID = c("customer1", "customer1", "customer1", 
"customer1", "customer1", "customer2", "customer2", "customer2", 
"customer2", "customer2", "customer2", "customer2", "customer5", 
"customer5", "customer5", "customer5", "customer5", "customer3", 
"customer3", "customer3", "customer3", "customer3", "customer3", 
"customer3", "customer3", "customer3", "customer3", "customer4", 
"customer4", "customer4", "customer4", "customer4"), Task = c("New", 
"Edit", "Answered", "FeedbackRequired", "Closed", "New", "Edit", 
"Answered", "FeedbackRequired", "Closed", "Closed", "Closed", 
"New", "Edit", "Answered", "FeedbackRequired", "Closed", "New", 
"Edit", "HubAdded", "Answered", "FeedbackRequired", "Closed", 
"Closed", "Closed", "Closed", "Closed", "New", "Edit", "Answered", 
"FeedbackRequired", "Closed"), Date_time = structure(c(1668008012.93733, 
1668008320.29733, 1668008684.57472, 1668153771.45687, 1668697463.01071, 
1649685322.67473, 1649685593.46752, 1652267835.13924, 1652267861.07935, 
1660742609.41271, 1660742664.11297, 1660743161.80927, 1678295469.58648, 
1678295749.33997, 1678359922.0184, 1678787443.43049, 1680703787.10976, 
1661514257.02831, 1661514383.23061, 1661526698.41032, 1661527095.83771, 
1661527117.512, 1662457363.51916, 1662457378.0676, 1662457519.11092, 
1663232439.58358, 1663246649.3237, 1680252406.63738, 1680253548.17636, 
1680254179.34628, 1680254196.74463, 1680257109.1508), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), row.names = c(1L, 2L, 4L, 5L, 6L, 
8L, 9L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 21L, 22L, 23L, 65L, 
66L, 68L, 69L, 70L, 71L, 73L, 75L, 77L, 79L, 994L, 995L, 997L, 
998L, 999L), class = "data.frame")
2

There are 2 answers

1
Friede On BEST ANSWER

If Task == "New" appears for each ID only once, e.g.

> length(unique(df1[df1$Task == "New", "ID"])) == length(unique(df1$ID))
[1] TRUE

the following approach might work. businessDuration() from {BusinessDuration} comes in handy to calculate the difftime w.r.t. business hours. It seems like businessDuration() is not vectorized, we therefore use Vectorize():

xyzzy = 
  merge(x = df1[df1$Task == "New", c("ID", "Date_time")], 
        y = with(df1[df1$Task == "Closed", ], 
                 aggregate(list(Date_time = Date_time), list(ID = ID), 
                           max, drop = FALSE)), 
        by = "ID") |>
  `colnames<-`(c("ID", "New", "Closed")) # cosmetics 

library(BusinessDuration)
f = Vectorize(\(x, y) 
              businessDuration(startdate = x, enddate = y, 
                               starttime = "08:00:00", endtime = "17:00:00", 
                               unit = "min"))
within(xyzzy, {wmin = f(x = New, y = Closed)})

gives

         ID                 New              Closed        wmin
1 customer1 2022-11-09 15:33:32 2022-11-17 15:04:23  3210.85000
2 customer2 2022-04-11 13:55:22 2022-08-17 13:32:41 49657.31667
3 customer3 2022-08-26 11:44:17 2022-09-15 12:57:29  7633.20000
4 customer4 2023-03-31 08:46:46 2023-03-31 10:05:09    78.38333
5 customer5 2023-03-08 17:11:09 2023-04-05 14:09:47 10749.78333

Have a look at the help file (?businessDuration) for further details, e.g.

weekendlist
Custom weekend list. Default is "Saturday" & "Sunday"

holidaylist
Custom holiday list. Default is NULL

2
xx0020 On

Here is one approach. We could make it more compact, but at this moment I thought it might be useful to make it more explicit in order to show individual steps in a way that is easier to understand. Note that I became aware that you want to allow only hours between 8 am and 5 pm, but in the data you provided "New" can start outside this range (customer 5).

library(tidyverse)

## data below

dat %>% 
  group_by(customer) %>% 
  filter(task == "New" | task == "Closed" & dt == max(dt)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = task, values_from = dt) %>% 
  mutate(
    n_nonweekend_days = map2_int(
      .x = New, 
      .y = Closed, 
      .f = \(x, y) sum(!wday(
        seq(floor_date(x, unit = "day"), floor_date(y, unit = "day"), "days"),
        week_start = 1) %in% c(6,7))
      ),
    stop_daynew = ymd_hms(paste(floor_date(New, unit = "day"), "17:00:00")),
    start_dayclosed = ymd_hms(paste(floor_date(Closed, unit = "day"), "08:00:00")),
  ) %>% 
  mutate(
    hours_required = case_when(
      floor_date(New, unit = "day") != floor_date(Closed, unit = "day") ~ 
        difftime(stop_daynew, New, units = "hour") + 
        ## subtract 2 to not include start and end day, multiply by the 9
        ## permitted hours per day
        (n_nonweekend_days - 2) * 9 + 
        difftime(Closed, start_dayclosed, units = "hour"),
      .default = Closed - New
    )
  )

data (same values as you provided)

dat <- tibble(
  customer = c(rep("customer1", 5), rep("customer2", 7), rep("customer5", 5),
               rep("customer3", 10), rep("customer4", 5)),
  task = c("New", "Edit", "Answered", "FeedbackRequired", "Closed", "New", "Edit", 
           "Answered", "FeedbackRequired", "Closed", "Closed", "Closed", "New",
           "Edit", "Answered", "FeedbackRequired", "Closed", "New", "Edit",
           "HubAdded", "Answered", "FeedbackRequired", "Closed",  "Closed",
           "Closed", "Closed", "Closed", "New", "Edit", "Answered", 
           "FeedbackRequired", "Closed"),
  dt = ymd_hms(
    c("2022-11-09 15:33:32", "2022-11-09 15:38:40", "2022-11-09 15:44:44",
      "2022-11-11 08:02:51", "2022-11-17 15:04:23", "2022-04-11 13:55:22",
      "2022-04-11 13:59:53", "2022-05-11 11:17:15", "2022-05-11 11:17:41",
      "2022-08-17 13:23:29", "2022-08-17 13:24:24", "2022-08-17 13:32:41",
      "2023-03-08 17:11:09", "2023-03-08 17:15:49", "2023-03-09 11:05:22",
      "2023-03-14 09:50:43", "2023-04-05 14:09:47", "2022-08-26 11:44:17",
      "2022-08-26 11:46:23", "2022-08-26 15:11:38 ", "2022-08-26 15:18:15",
      "2022-08-26 15:18:37", "2022-09-06 09:42:43", "2022-09-06 09:42:58",
      "2022-09-06 09:45:19", "2022-09-15 09:00:39", "2022-09-15 12:57:29",
      "2023-03-31 08:46:46", "2023-03-31 09:05:48", "2023-03-31 09:16:19",
      "2023-03-31 09:16:36", "2023-03-31 10:05:09")
    )
)