Subset data in R on the condition of decreasing values

95 views Asked by At

couldn´t find a question alike, so here we go: I have a large dataset in R and I want to prepare it for hazard analysis. I thus want to create a dichotomous survival variable. However my hazard event has a relative interpretation and is not just a certain value being 0. The dataset is of the form:

ID    y
1     0
1     15
1     30
1     29
1     10
2     11
2     64
2     86
2     79
2     75

plus a bunch of independent and control variables. The IDs enter the subset fit for survival analysis as y > 0. Back to the hazard variable: I want it to take on "1", as decreasing values of y fall below a threshold, which is 75% of the highest value y reaches with respect to the ID groups. Thus two conditions have to be fulfilled for the hazard:

  1. the values fall below the threshold (with respect to their IDs)
  2. the values are decreasing. So if there is a y column like the one above, for the first two values of ID 2 the hazard variable should not be coded "1", as the values might fall below the threshold but are higher than the previous value.

Anyone got a solution for that? Thanks in advance?

1

There are 1 answers

0
Allan Cameron On

If I understand you correctly, only the last value in group 1 should meet the conditions, since it is decreasing and less than 75% of the group's maximum. In group 2, there are no values that meet these criteria.

The tidyverse solution would look like this:

library(dplyr)

df %>% 
  group_by(ID) %>%
  mutate(hazard = +(y < 0.75 * max(y) & c(0, diff(y)) < 0))
#> # A tibble: 10 x 3
#> # Groups:   ID [2]
#>       ID     y hazard
#>    <int> <int>  <int>
#>  1     1     0      0
#>  2     1    15      0
#>  3     1    30      0
#>  4     1    29      0
#>  5     1    10      1
#>  6     2    11      0
#>  7     2    64      0
#>  8     2    86      0
#>  9     2    79      0
#> 10     2    75      0

Data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), 
                     y  = c(0L, 15L, 30L, 29L, 10L, 11L, 64L, 86L, 79L, 75L)), 
                class = "data.frame", row.names = c(NA,  -10L))

Created on 2020-07-25 by the reprex package (v0.3.0)