I have a relatively large data frame (approx. 2000 x 50) and a list (size approx. 250). I want to find and pick out the rows that contain anyone of the IDs that are in my list, however this needs to happen by partial string matching since the data frame contains ID that are concatenated into larger strings (and possibly even annotated)
So the associated column in the data frame looks like this:
[1] A0AV96-2;A0AV96 A0AVT1;A0AVT1-2 A1A5D9-2;A1A5D9 A1L0T0
[5] Q8IVF6;A2A2Z9 A8K2U0 B0FP48 B5MCY1
[9] Q99613-2;Q99613;B5ME19 B9A064;P0CG04 CON__A2AB72 CON__O76015;O76015
[13] CON__O95678;O95678 CON__P00761 CON__P01966 CON__P02533;P02533
[17] CON__P02538;P02538 CON__P02768-1;P02768;P02768-2 CON__P02769 CON__P04258;P02461;P02461-2
... and the list of IDs I want to match is just containing character values for IDs I want to isolate.
So my question is how can I select the rows in my data frame that contain an ID that is in my target list? Based on some answers here on SO I have come up with the following solution;
raw.ind <- Reduce(union, lapply(target.list, function(a) which(grepl(a, df$IDs))))
but I am not sure if it is the "right" way to do it or if there are other/better solutions to the problem.