Count then calculate differences from first and previous rows after grouping, but ignoring text/character values in r

Question

Count then calculate differences from first and previous rows after grouping, but ignoring text/character values in r

74 views Asked by denis At 13 January 2024 at 14:51

From the following data grouped by id and visit, which include a mix of numeric and text/character values, how to create these 3 new columns:

count_wotxt: count by id and visit but without considering text/character values
diff_value_first: calculate the difference between each numeric value versus the first visit for each id, ignoring text/character values
diff_value_previous: calculate the difference between each numeric value versus the previous visit for each id, ignoring text/character values

Data:

dat <-  
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
    visit = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), 
    value = c("5", "7", "10", "20", "15", "text0", "25", "text1", 
    "100", "text2", "text3", "120", "text4", "50", "45"), count = c(1L, 
    2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, 
-15L))

Desired output:

> dat2
   id visit value count count_wotxt diff_value_first diff_value_previous
1   1     1     5     1           1                0                   0
2   1     1     7     2           2                2                   2
3   1     1    10     3           3                5                   3
4   1     2    20     1           1                0                   0
5   1     2    15     2           2               -5                  -5
6   1     2 text0     3          NA               NA                  NA
7   1     2    25     4           3                5                  10
8   1     2 text1     5          NA               NA                  NA
9   2     1   100     1           1                0                   0
10  2     1 text2     2          NA               NA                  NA
11  2     1 text3     3          NA               NA                  NA
12  2     1   120     4           2               20                  20
13  2     2 text4     1          NA               NA                  NA
14  2     2    50     2           1               NA                   0
15  2     2    45     3           2               NA                  -5

Thanks for help

Original Q&A

There are 2 answers

denis On 13 January 2024 at 17:07

Below, the solution, mainly based on TarJae's proposal (thanks!), slightly modified at the end,

dat %>%
  group_by(id, visit) %>%
  mutate(
    count_wotxt = ifelse(grepl("^[0-9]+$", value), cumsum(grepl("^[0-9]+$", value)), NA),
         value_numeric = as.numeric(as.character(value)),
         diff_value_first = value_numeric - value_numeric[1]) %>%
  fill(value_numeric, .direction = "down") %>%
  mutate(diff_value_previous =
           case_when(is.na(count_wotxt) ~ NA,
                     count_wotxt == 1 ~ 0,
                     TRUE ~ (value_numeric - lag(value_numeric, default = first(value_numeric))))) %>%
  ungroup() %>%
  select(-value_numeric)

which gives

> dat
# A tibble: 15 x 7
   id    visit value count count_wotxt diff_value_first diff_value_previous
   <fct> <fct> <chr> <int>       <int>            <dbl>               <dbl>
 1 1     1     5         1           1                0                   0
 2 1     1     7         2           2                2                   2
 3 1     1     10        3           3                5                   3
 4 1     2     20        1           1                0                   0
 5 1     2     15        2           2               -5                  -5
 6 1     2     text0     3          NA               NA                  NA
 7 1     2     25        4           3                5                  10
 8 1     2     text1     5          NA               NA                  NA
 9 2     1     100       1           1                0                   0
10 2     1     text2     2          NA               NA                  NA
11 2     1     text3     3          NA               NA                  NA
12 2     1     120       4           2               20                  20
13 2     2     text4     1          NA               NA                  NA
14 2     2     50        2           1               NA                   0
15 2     2     45        3           2               NA                  -5

**TarJae** · Accepted Answer · 2024-01-13T15:52:00+00:00

Here is one logic we could apply to get the desired output: I think the most challenging part is the difference of the previous row in the presence of NAs. I have solved this by using fill:

library(dplyr)
library(tidyr)

dat %>% 
  group_by(id, visit) %>% 
  mutate(count_wotxt = ifelse(grepl("^[0-9]+$", value), cumsum(grepl("^[0-9]+$", value)), NA),
         value_numeric = as.numeric(as.character(value)),
         diff_value_first = value_numeric - value_numeric[1]) %>% 
  fill(value_numeric, .direction = "down") %>% 
  mutate(diff_value_previous = value_numeric - lag(value_numeric, default = first(value_numeric)), .keep="unused") %>% 
  ungroup()

# A tibble: 15 × 7
   id    visit value count count_wotxt diff_value_first diff_value_previous
   <fct> <fct> <chr> <int>       <int>            <dbl>               <dbl>
 1 1     1     5         1           1                0                   0
 2 1     1     7         2           2                2                   2
 3 1     1     10        3           3                5                   3
 4 1     2     20        1           1                0                   0
 5 1     2     15        2           2               -5                  -5
 6 1     2     text0     3          NA               NA                   0
 7 1     2     25        4           3                5                  10
 8 1     2     text1     5          NA               NA                   0
 9 2     1     100       1           1                0                   0
10 2     1     text2     2          NA               NA                   0
11 2     1     text3     3          NA               NA                   0
12 2     1     120       4           2               20                  20
13 2     2     text4     1          NA               NA                  NA
14 2     2     50        2           1               NA                  NA
15 2     2     45        3           2               NA                  -5
Warning message:
There were 3 warnings in `mutate()`.
The first warning was:
ℹ In argument: `value_numeric = as.numeric(as.character(value))`.
ℹ In group 2: `id = 1`, `visit = 2`.
Caused by warning:
! NAs introduced by coercion
ℹ Run dplyr::last_dplyr_warnings() to see the 2 remaining warnings.

TechQA.

Count then calculate differences from first and previous rows after grouping, but ignoring text/character values in r

There are 2 answers

Related Questions in R

Related Questions in TEXT

Related Questions in GROUPING

Related Questions in DIFFERENCE

Related Questions in COUNTING

Popular Questions

Trending Questions