R filter rows based on multiple partial strings applied to multiple columns

2.2k views Asked by At

Sample of dataset:

diag01 <- as.factor(c("S7211","J47","J47","K729","M2445","Z509","Z488","R13","L893","N318","L0311","S510","A047","D649"))
diag02 <- as.factor(c("K590","D761","J961","T501","M8580","R268","T831","G8240","B9688","G550","E162","T8902","E86","I849"))
diag03 <- as.factor(c("F058","M0820","E877","E86","G712","R32","A408","E888","G8220","C794","T68","L0310","M1094","D469"))
diag04 <- as.factor(c("E86","C845","R790","I420","G4732","R600","L893","R509","T913","C795","M8412","G8212","L891","L0311"))
diag05 <- as.factor(c("R001","N289","E876","E871","H659","R4589","N508","B99","I209","C773","T921","Q070","H919","L033"))
diag06 <- as.factor(c("I951","E877","S7240","I500","H901","E119","Z223","K590","I959","C509","G819","F719","Z290","R13"))

df <- data.frame(diag01, diag02, diag03, diag04, diag05, diag06)

I want to filter the entire rows that have a partial string match anywhere in a given list of columns (e.g. diag01, diag02, ...). I can achieve this on a single column e.g.

junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", diag02))

but I need to apply this to multiple columns (the original dataset has 216 columns and >1,000,000 rows). Among other options, I have tried

junk <- filter(df, grepl(pattern="^E11|^E16|^E86|^E87|^E88", df[,c(1:6)]))
junk <- apply(df, 1, function(r) any(r %in% grepl(pattern="^E11|^E16|^E86|^E87|^E88")))

I need the entire row and ideally I would like the filtering criteria to be restricted to a given list of columns as it is likely values in other columns may begin with the declared partial strings.

Made a genuine effort to search for a solution but obviously my knowledge of R is lacking.

1

There are 1 answers

0
akrun On BEST ANSWER

Perhaps we need

df %>%
   filter_all(any_vars(grepl(pattern="^(E11|E16|E86|E87|E88)", .)))

Or with purrr and dplyr

library(dplyr)
library(purrr)
df %>%
   map(~grepl(pattern="^E11|^E16|^E86|^E87|^E88", .)) %>% 
   reduce(`|`) %>%
   df[.,]