Filter one dataframe based on the last two digits of another dataframe's value under one column in R

151 views Asked by At

The table, Data_frame, has an ID column that contains over 1000 participants' information, such as "Sample_LI.01"

My_ColData also has an ID column that contains only 40 participants' different information, such as "Sample_LI-01".

I want to use the ID column in My_ColData to filter the Data_frame table. However, you may have noticed the formats of ID are slightly different. I wonder if the best way to possibly filter based on the last two digits?

I have a code so far, look like

data_frame %>% filter (ID %in% my_ColData$ID, if______)

Having no idea what to write about in this if condition. Or is there a better to realize my goal? Any suggestions would be appreciated.

2

There are 2 answers

1
akrun On

We could use str_replace to replace the - with . to match the 'ID' from 'data_frame' with the 'ID' from 'my_ColData'

library(dplyr)
library(stringr)
data_frame %>% 
       filter(ID %in% str_replace(my_ColData$ID, '-', '.') )
1
TarJae On

We could use str_sub to check for the last two digits

library(dplyr)
library(stringr)
data_frame %>% 
  filter(str_sub(ID, -2) %in% str_sub(my_colData$ID, -2))