Finding rows that contain specific ids, based on string matching

226 views Asked by At

I have a relatively large data frame (approx. 2000 x 50) and a list (size approx. 250). I want to find and pick out the rows that contain anyone of the IDs that are in my list, however this needs to happen by partial string matching since the data frame contains ID that are concatenated into larger strings (and possibly even annotated)

So the associated column in the data frame looks like this:

[1] A0AV96-2;A0AV96               A0AVT1;A0AVT1-2               A1A5D9-2;A1A5D9               A1L0T0                       
[5] Q8IVF6;A2A2Z9                 A8K2U0                        B0FP48                        B5MCY1                       
[9] Q99613-2;Q99613;B5ME19        B9A064;P0CG04                 CON__A2AB72                   CON__O76015;O76015           
[13] CON__O95678;O95678            CON__P00761                   CON__P01966                   CON__P02533;P02533           
[17] CON__P02538;P02538            CON__P02768-1;P02768;P02768-2 CON__P02769                   CON__P04258;P02461;P02461-2

... and the list of IDs I want to match is just containing character values for IDs I want to isolate.

So my question is how can I select the rows in my data frame that contain an ID that is in my target list? Based on some answers here on SO I have come up with the following solution;

raw.ind <- Reduce(union, lapply(target.list, function(a) which(grepl(a, df$IDs))))

but I am not sure if it is the "right" way to do it or if there are other/better solutions to the problem.

0

There are 0 answers