Using str_match in R to check for numbers seperated by "_"

306 views Asked by At

I am trying to use a code in R that I found online. It has a dataframe model_data_frame as

model_data_frame
          Matchup Win
1  2012_1140_1233   1
2  2012_1290_1443   0
3  2012_1143_1378   0
4  2012_1249_1436   0

The person then set

pattern <- "[A-Z]_([0-9]{3})_([0-9]{3})"

and then

teamIDs <- as.data.frame(str_match(model_data_frame$Matchup, pattern))
teamIDs <- teamIDs[ , c(2,3)]

I'm guessing the result for teamIDs should look something like

teamIDs
    V1   V2  
1  1140 1233
2  1290 1443 
3  1143 1378 
4  1249 1436

instead it looks like

 teamIDs
    V1   V2  
1  <NA> <NA>
2  <NA> <NA> 
3  <NA> <NA> 
4  <NA> <NA>

I'm guessing its because pattern <- "[A-Z]_([0-9]{3})_([0-9]{3})" is wrong. What should I change it to?

1

There are 1 answers

0
Omley On

There is a simpler way of doing this than relying on regex. Use strsplit.

df <- read.table("clipboard", header = T)
teamIDs <- t(as.data.frame(strsplit(as.character(df$Matchup),"_")))[,2:3]