Subsetting rows with sequence of values and identifying columns where sequence begins

Question

Subsetting rows with sequence of values and identifying columns where sequence begins

63 views Asked by Lina At 31 March 2024 at 06:25

I have a large dataset of this format. I would like to a) identify those IDs/rows with the following sequence of values 1 - 1 - >1 - >1 anywhere between x1 and x10; and b) generate a new variable ("event") that identifies the beginning of the sequence, taking the value X1,...,X10.

my_df <- data.frame(ID = c("a","b","c","d","e","f","g","h"),
                    replicate(8,sample(1:4,8,rep=TRUE)))

For a), I have replaced values >1 with 2, then pasted the values from X1 to X10, and then filtered for the sequence 1 - 1 - 2 - 2. For b), I created the variable "event" using nested ifelse() to identify where the sequence begins. This works ok with only 8 columns. Is there a way to increase efficiency for datasets with more columns?

I highly appreciate any pointers!

df_seq <- my_df%>%
  mutate_at(vars(starts_with('X')), funs(ifelse(. > 1, 2, .)))%>%
  mutate(seq = paste(X1,"-",X2,"-",X3,"-",X4,"-",X5,"-",X6,"-",X7,"-",X8))%>%
  filter(grepl("1 - 1 - 2 - 2", seq))%>%
  mutate(event = ifelse(X1 == 1 & X2 == 1 & X3 == 2 & X4 == 2,"X1",
                 ifelse(X2 == 1 & X3 == 1 & X4 == 2 & X5 == 2,"X2",
                 ifelse(X3 == 1 & X4 == 1 & X5 == 2 & X6 == 2,"X3",
                 ifelse(X4 == 1 & X5 == 1 & X6 == 2 & X7 == 2,"X4","X5")))))

Original Q&A

There are 1 answers

**ThomasIsCoding** · Accepted Answer · 2024-03-31T07:57:08+00:00

You can use regexpr to capture the first occurrence of consecutive 1s followed by values greater than 1, for example

set.seed(0)
my_df %>%
    mutate(event = regexpr("1{2}[^1]{2}", do.call(paste0, select(., -ID)))) %>%
    mutate(event = ifelse(event > 0, paste0("X", event), NA))

gives

  ID X1 X2 X3 X4 X5 X6 X7 X8 event
1  a  2  3  2  1  2  2  4  3  <NA>
2  b  1  2  2  1  1  2  1  4  <NA>
3  c  4  2  2  1  4  2  3  2  <NA>
4  d  3  3  2  2  3  3  2  2  <NA>
5  e  1  3  3  1  1  4  1  3  <NA>
6  f  2  1  1  1  4  4  4  3    X3
7  g  1  1  3  2  3  4  4  2    X1
8  h  3  1  1  2  2  2  1  2    X2

TechQA.

Subsetting rows with sequence of values and identifying columns where sequence begins

There are 1 answers

Related Questions in R

Related Questions in DATAFRAME

Related Questions in DPLYR

Popular Questions

Trending Questions